Help processing a file or array


#1

Hello, I’m a newbie, and need some help to process a file.
The file has lines with something like this:

rewrewrwer rrrrrrrrrrrrr aa1 rrrrrrrrrr
rewrwerwer rrrrrrrrrrrrr bb1 rrrrrrrrrr
rwerfwdffsd rrrrrrrrrrrrr cc1 rrrrrrrrrr
ewrwerwerwer rrrrrrrrrrrrr dd1 rrrrrrrrrr
trtretertert rrrrrrrrrrrrr ee1 rrrrrrrrrr

and another file with

aa1
cc1

I’d like to create a new file without lines containing aa1 and cc1

Reading the files and get arrays with the content is easy:

lines = File.new(“file1”).readlines
tags = File.new("file2).readlines

Is there a Ruby way to remove lines from ‘lines’ variable which contain
tags from ‘tags’ variable?


#2

You’d want to use a regular expression, I think. Probably a nested
loop. I could do this in Perl in about two minutes, but I’m still
adjusting my thinking for Ruby.

lines.each do |line|
tags.each do |tag|
final << line if line !~ /#{tag.chop}/
end
end

Then write the “final” array to whatever file you want to.

I threw in the “.chop” there to get rid of the newline character on the
tag.

-Augie


#3

On 3/22/07, Eduardo Yáñez Parareda removed_email_address@domain.invalid
wrote:

Is there a Ruby way to remove lines from ‘lines’ variable which contain tags from ‘tags’ variable?


Eduardo Yáñez Parareda
http://legalizate.blogspot.com

tags_re = Regexp.new("\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\b")
lines.delete_if {|l| l =~ tags_re }

explanation:

tags_re will have form “\b(?:tag1|tag2|tag3|…|tagn)\b”
\b are to match only whole words, not part of the words.
note that if you put in too many tags you may get errors (regexp too
long or something similar)

the last line deletes all lines that match the regexp.


#4

tags_re = Regexp.new("\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\b")
lines.delete_if {|l| l =~ tags_re }

One more time I have to praise Ruby…
Thanks Jan.


#5

On 3/22/07, Eduardo Yáñez Parareda removed_email_address@domain.invalid
wrote:

tags_re = Regexp.new("\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\b")
lines.delete_if {|l| l =~ tags_re }

One more time I have to praise Ruby…
Thanks Jan.

Now that I look at it: this is more like perl than ruby… Ron’s
version is probably a bit slower but much more readable…


#6

On 3/22/07, Eduardo Yáñez Parareda removed_email_address@domain.invalid
wrote:

lines = File.new(“file1”).readlines
tags = File.new("file2).readlines

Is there a Ruby way to remove lines from ‘lines’ variable which contain
tags from ‘tags’ variable?

The most straightforward way seems to be:

lines.reject do |line|
tags.any? do |tag|
line.include? tag.chomp
end
end

…or the same thing a bit more succinctly:

lines.reject {|line| tags.any? {|tag| line.include?(tag.chomp) }}

  • chopper