Forum: Ruby Help processing a file or array

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Eduardo Yáñez Parareda (Guest)
on 2007-03-22 14:30
(Received via mailing list)
Hello, I'm a newbie, and need some help to process a file.
The file has lines with something like this:

rewrewrwer rrrrrrrrrrrrr aa1 rrrrrrrrrr
rewrwerwer rrrrrrrrrrrrr bb1 rrrrrrrrrr
rwerfwdffsd rrrrrrrrrrrrr cc1 rrrrrrrrrr
ewrwerwerwer rrrrrrrrrrrrr dd1 rrrrrrrrrr
trtretertert rrrrrrrrrrrrr ee1 rrrrrrrrrr

and another file with

aa1
cc1

I'd like to create a new file without lines containing aa1 and cc1

Reading the files and get arrays with the content is easy:

lines = File.new("file1").readlines
tags = File.new("file2).readlines

Is there a Ruby way to remove lines from 'lines' variable which contain
tags from 'tags' variable?
Augie De Blieck Jr. (Guest)
on 2007-03-22 14:44
(Received via mailing list)
You'd want to use a regular expression, I think.  Probably a nested
loop.  I could do this in Perl in about two minutes, but I'm still
adjusting my thinking for Ruby.

lines.each do |line|
  tags.each do |tag|
    final << line if line !~ /#{tag.chop}/
  end
end

Then write the "final" array to whatever file you want to.

I threw in the ".chop" there to get rid of the newline character on the
tag.

-Augie
Jan S. (Guest)
on 2007-03-22 15:00
(Received via mailing list)
On 3/22/07, Eduardo Yáñez Parareda <removed_email_address@domain.invalid>
wrote:
>
> Is there a Ruby way to remove lines from 'lines' variable which contain tags from 'tags' 
variable?
>
> --
> Eduardo Yáñez Parareda
> http://legalizate.blogspot.com
>

tags_re = Regexp.new("\\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\\b")
lines.delete_if {|l| l =~ tags_re }

explanation:

tags_re will have form "\b(?:tag1|tag2|tag3|...|tagn)\b"
\b are to match only whole words, not part of the words.
note that if you put in too many tags you may get errors (regexp too
long or something similar)

the last line deletes all lines that match the regexp.
Eduardo Yáñez Parareda (Guest)
on 2007-03-22 16:55
(Received via mailing list)
> tags_re = Regexp.new("\\b(?:#{tags.map {|t|
> Regexp.escape(t.chomp)}.join("|")})\\b")
> lines.delete_if {|l| l =~ tags_re }

One more time I have to praise Ruby...
Thanks Jan.
Ron Hopper (Guest)
on 2007-03-22 18:30
(Received via mailing list)
On 3/22/07, Eduardo Yáñez Parareda <removed_email_address@domain.invalid>
wrote:
>
>
> lines = File.new("file1").readlines
> tags = File.new("file2).readlines
>
> Is there a Ruby way to remove lines from 'lines' variable which contain
> tags from 'tags' variable?
>

The most straightforward way seems to be:

lines.reject do |line|
  tags.any? do |tag|
    line.include? tag.chomp
  end
end

...or the same thing a bit more succinctly:

lines.reject {|line| tags.any? {|tag| line.include?(tag.chomp) }}


- chopper
Jan S. (Guest)
on 2007-03-22 18:43
(Received via mailing list)
On 3/22/07, Eduardo Yáñez Parareda <removed_email_address@domain.invalid>
wrote:
> > tags_re = Regexp.new("\\b(?:#{tags.map {|t|
> > Regexp.escape(t.chomp)}.join("|")})\\b")
> > lines.delete_if {|l| l =~ tags_re }
>
> One more time I have to praise Ruby...
> Thanks Jan.

Now that I look at it: this is more like perl than ruby... Ron's
version is probably a bit slower but much more readable...
This topic is locked and can not be replied to.