Regexp exclusion search - find matches NOT ending with a string?

I have the following text in a file:

1 a1.html
2 b.doc
3 c.xml
4 d.tiff
5 e.jpeg
6 f.html

I need a regexp to match lines except those that end with ending in
“.html” - iow - I want lines 2-5 above. I believe this may require a
negative lookbehind match. I tried the following but Ruby (1.8) gives
an undefined sequence error :

$(?<!.html) # <---- this seems to work with other engines

Before you jump re Ruby the version I also tested this here -
http://www.rubyxp.com/ and get invalid expression (fyi this tests with
Ruby 1.9). Any ideas/alternatives?

TIA,
BC

On Fri, Jul 17, 2009 at 2:35 AM, BrendanC[email protected] wrote:

I need a regexp to match lines except those that end with ending in
“.html”

The easiest path is to negate that it matches, say for instance:

if filename !~ /.html\z/
# non-HTML here
end

– fxn

Hi –

On Fri, 17 Jul 2009, BrendanC wrote:

I need a regexp to match lines except those that end with ending in
“.html” - iow - I want lines 2-5 above. I believe this may require a
negative lookbehind match. I tried the following but Ruby (1.8) gives
an undefined sequence error :

$(?<!.html) # <---- this seems to work with other engines

Before you jump re Ruby the version I also tested this here -
http://www.rubyxp.com/ and get invalid expression (fyi this tests with
Ruby 1.9). Any ideas/alternatives?

I would probably do:

lines.reject {|line| line =~ /html$/ }

David

%r($(?<!.html)\z) # is that what you meant above?
where does this $ come from ?

At 2009-07-16 08:59PM, “David A. Black” wrote:

On Fri, 17 Jul 2009, BrendanC wrote:

$(?<!.html) # <---- this seems to work with other engines

I would probably do:

lines.reject {|line| line =~ /html$/ }

Is the Ruby regular expression syntax documented anywhere?

I was attempting to use a look-behind, but it’s not supported.

The syntax is not documented in the RegExp rdocs, and I haven’t seen a
site that spells out all the nitty-gritty details and pokes into the
dark corners.

I’m looking for the Ruby equivalent of:
http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm
http://docs.python.org/library/re.html#regular-expression-syntax
http://perldoc.perl.org/perlre.html

Does it exist?

On 7/17/09, BrendanC [email protected] wrote:

I need a regexp to match lines except those that end with ending in
“.html” - iow - I want lines 2-5 above. I believe this may require a
negative lookbehind match. I tried the following but Ruby (1.8) gives
an undefined sequence error :

$(?<!.html) # <---- this seems to work with other engines

Before you jump re Ruby the version I also tested this here -
http://www.rubyxp.com/ and get invalid expression (fyi this tests with
Ruby 1.9). Any ideas/alternatives?
Xavier and David gave good advice.
If however you really have to have a matching regex

%r($(?<!.html)\z) # is that what you meant above?

works fine. I believe that you can install Oniguruma on 1.8 as a gem
for that purpose.
HTH
Robert


Toutes les grandes personnes ont d’abord été des enfants, mais peu
d’entre elles s’en souviennent.

All adults have been children first, but not many remember.

[Antoine de Saint-Exupéry]

On 7/17/09, Glenn J. [email protected] wrote:

I was attempting to use a look-behind, but it’s not supported.
Does it exist?
For Oniguruma I found this most helpful
http://manual.macromates.com/en/regular_expressions#regular_expressions


Glenn J.
Write a wise saying and your name will live forever. – Anonymous
Nice one

Cheers
Robert

On Jul 17, 2009, at 11:30 AM, Glenn J. wrote:

I was attempting to use a look-behind, but it’s not supported.
Does it exist?


Glenn J.
Write a wise saying and your name will live forever. – Anonymous

You could try the Regular Expressions section of the Standard Types
chapter of Programming Ruby. Be advised that this is the online
version of the 1st edition that is now 8 years old. Since you seem to
be using a version 1.8.x of Ruby, the Regexp parts are going to be
mostly the same.

http://www.ruby-doc.org/docs/ProgrammingRuby/

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

BrendanC wrote:

I have the following text in a file:

1 a1.html
2 b.doc
3 c.xml
4 d.tiff
5 e.jpeg
6 f.html

I need a regexp to match lines except those that end with ending in
“.html” - iow - I want lines 2-5 above.

Some alternate means to the same end:

IO.foreach(“data.txt”) do |line|

#1
if line.chomp.split(".")[-1] != “html”
puts line
end

#2
if line[-5, 4] != “html”
print line
end

#3
if line.slice(-5…-1) != “html”
print line
end

puts
end

–output:–
2 b.doc
2 b.doc
2 b.doc

3 c.xml
3 c.xml
3 c.xml

4 d.tiff
4 d.tiff
4 d.tiff

5 e.jpeg
5 e.jpeg
5 e.jpeg

Glenn J. wrote:

Is the Ruby regular expression syntax documented anywhere?

I was attempting to use a look-behind, but it’s not supported.

The syntax is not documented in the RegExp rdocs

In my opinion, documentation is Ruby’s weakest aspect by far - and the
deficiency has gotten substantially worse with ruby 1.9.

Best available information is in third-party books, which presumably
have reverse-engineered from the source code. I fairly often resort to
irb to check behaviour is what I want, and have on occasions had to
resort to reading the source.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs