Regular expressions

I’m sorry if this has been done before, but I can’t seem to find any
good examples:

I’m trying to search mail messages for headers based on the following:

config = YAML.load_file(‘config.yaml’)

$stdin.each do |line|
puts line if line =~ /^Recevied:/ … line =~ /by
#{config[‘hostname’]}/o
end

and guess what? It doesn’t work.
If I hard code the hostname it works fine.
But this and /by config[‘hostname’]/ both fail to match.

So, how do you set a variable in a regular expression?

Also, how would I match multiple lines?

I would much rather match something like:

/^Received:.+?\s{4,}by #{config[‘hostname’]}/sm
because that will pick up the one Received header I want.
I can write in Perl 5 regex but I’m not as familiar with Ruby’s methods.

Tom A. wrote:

So, how do you set a variable in a regular expression?

You have it right.

Also, how would I match multiple lines?

You’re right again with the //m. But you can’t read line-by-line
(IO#each) and match across multiple lines without building some kind of
parser. It’s usually better to just read the data into a single string
for multiline matches.

Input (multiline):
Received: from mail.papa.smurf (mail.papa.smurf [127.0.0.1])
by [email protected]

config = {‘hostname’ => ‘[email protected]’}
line = $stdin.read
puts line if line =~ /^Received:.*by #{config[‘hostname’]}/m

I would much rather match something like:

/^Received:.+?\s{4,}by #{config[‘hostname’]}/sm
because that will pick up the one Received header I want.
I can write in Perl 5 regex but I’m not as familiar with Ruby’s methods.

Input (single line):
Received: from mail.papa.smurf (mail.papa.smurf [127.0.0.1]) by
[email protected]

$stdin.each { |line|
puts line if line =~ /^Received:.+?\s{4,}by #{config[‘hostname’]}/
}

Regards,
Jordan

MonkeeSage wrote:

for multiline matches.

Input (multiline):
Received: from mail.papa.smurf (mail.papa.smurf [127.0.0.1])
by [email protected]

config = {‘hostname’ => ‘[email protected]’}
line = $stdin.read
puts line if line =~ /^Received:.*by #{config[‘hostname’]}/m

This prints out the entire message.

I’m only trying to get the one section that matched the Received header.
I should be able to do with with the statement
puts line if line =~ /Received/ … line =~ /#{config[‘hostname’]}/
At least that’s what I’m believe I’m being led towards.

But the match on /Received/ turns on the printing and it never matches
on the
second regexp.

Tom A. wrote:

end

and guess what? It doesn’t work.

Yes, it doesn’t, and I think I know why. I have been hearing for years
how
it just doesn’t matter whether young people learn how to spell common
words, and I have steadfastly taken the position that it does matter.

It turns out that people who can’t spell, and who leave school confident
that it is an out-of-date art, slowly discover all the odd places where
knowing how to spell actually makes a difference.

So, now that you are living in reality, perhaps you will discover one of
the
problems with your regular expression is that “Recevied” isn’t a word,
and
it won’t filter out those e-mails you want to collect.

Tom A. wrote:

parser. It’s usually better to just read the data into a single string

This prints out the entire message.

Well, if the regexp as written correctly identifies the beginning and
end of
the desired area, how about this:

result = line.sub(/^(Received:.?by #{config[‘hostname’]})./m,"\1")

puts result if result.size > 0

Note the question mark in the middle of the regexp. It means to stop at
the
first match, not the last, of what follows. This change may not matter
in
practice, but it is a good habit to fall into when dealing with a lot of
data. One special case might fail after thousands of successful matches.

This hasn’t been tested.

On 2006.10.03 10:20, MonkeeSage wrote:

puts $1 if line =~ /^Received:(.+?)\s{4,}by #{config[‘hostname’]}/

prints " from mail.papa.smurf (mail.papa.smurf [127.0.0.1])"

}

And you almost certainly want to Regexp.escape that interpolation.

Tom A. wrote:

I’m only trying to get the one section that matched the Received header.
I should be able to do with with the statement
puts line if line =~ /Received/ … line =~ /#{config[‘hostname’]}/
At least that’s what I’m believe I’m being led towards.

In ruby … is a range operator. If you want a grouped match, use parens
and a backreference just like perl.

$stdin.each { |line|
puts $1 if line =~ /^Received:(.+?)\s{4,}by #{config[‘hostname’]}/

prints " from mail.papa.smurf (mail.papa.smurf [127.0.0.1])"

}

Regards,
Jordan

On 10/3/06, Tom A. [email protected] wrote:

and guess what? It doesn’t work.
If I hard code the hostname it works fine.
But this and /by config[‘hostname’]/ both fail to match.

To eliminate a possibility, do

p config[‘hostname’]

just before the $stdin line there. I can’t tell you how many times I
have forgotten about invisible newlines tacked onto my variables that
cause all sorts of matching to fail.

Les

Paul L. wrote:

puts line if line =~ /^Recevied:/ … line =~ /by
#{config[‘hostname’]}/o
end

and guess what? It doesn’t work.

Yes, it doesn’t, and I think I know why. I have been hearing for years how
it just doesn’t matter whether young people learn how to spell common
words, and I have steadfastly taken the position that it does matter.

You make a pretty good example of another lost art…