Ugh! This is completely unreadable. How about using switch /x and
embedding some comments? Constructing the large regexp from a few
smaller expressions might also help. For example, you could use
/[a-f0-9]/i for hex digits.
Few notes on glancing over this
[\wA-Z0-9] → \w
\w includes characters and numbers
This one isn’t quite as sophisticated for IP address matching as the one
rubular gave you, but it’s not necessary here. If you really want
stronger matching of IPv4 and IPv6 literals, you can do so if you wish.
its the first time me or my friend has worked with regex, my friend have
rewritten the regex a bit, maybe it makes more sense now:
(?#start: fetch the ip adress after IP)
IP\s((?:\d*.) {3}\d{1,3})
(?#end: fetch ip)
(?#start: flag).:\s([PSF])(?#end:flag) (?#nothing more interesting on
this line)
.$(?#end)
(?#start: look for index pattern)
\s^E.{5}@.{9}Q.{30,42}
(?#end: index)
(?#start: get the username which is surrounded by multiple dots, minimum
of 2 in the begining and 0+ after)
..{2,}(\w{1,13}\w).
(?#end: username)
Each of those expressions works individually and together(in rubular)
but when i combine them in my program it prints nothing, not even nil.
So i tried them individually in the program as well and all but the
index pattern(prints nothing) works. so if anyone could offer some
insight why its not working or knows a better way to do this we´ll be
very happy:)
another thing:
some usernames are really hard to extract from the packets. an example:
G-eX.Dowden(http://rubular.com/regexes/8401)
any suggestions?
File.foreach filename do |line|
b = line.scan rx
puts b.length
end
(?#start: flag).:\s([PSF])(?#end:flag) (?#nothing more interesting on
this line)
.$(?#end)
(?#start: look for index pattern)
\s^E.{5}@.{9}Q.{30,42}
(?#end: index)
You are looking for an end-of-line ($), followed by whitespace (\s),
followed by a start of line (^). This doesn’t look right to me. It might
work sometimes, depending on whether your end-of-line is \n or \r\n
(?#start: get the username which is surrounded by multiple dots, minimum
of 2 in the begining and 0+ after)
..{2,}(\w{1,13}\w).
That one makes little sense.
[\w] is the same as \w
(?:\w+.) means one or more word characters followed by any character;
this is then releated between 1 and 13 times
\w must be followed by a word character
.* this is superfluous, since it matches 0 or more dots,
it would therefore match regardless of what is next
Each of those expressions works individually and together(in rubular)
Don’t test them in rubular. Test them in irb or in ruby.
another thing:
some usernames are really hard to extract from the packets. an example:
G-eX.Dowden(http://rubular.com/regexes/8401)
any suggestions?
You’re using the wrong way to view the packets in the first place.
Using a ruby interface to libpcap would be the safest way - I think I
saw one, but I’ve never used it.
Otherwise, look at tcpdump -X for a proper hex packet dump.
Don’t test them in rubular. Test them in irb or in ruby.
In a ruby file, you can comment out bits of them until you make it work,
e.g.
re = %r{
(?#start: fetch the ip adress after IP)
IP\s((?:\d+.){3}\d{1,3})
(?#end: fetch ip)
}x
#(?#start: flag).:\s([PSF])(?#end:flag)
#(?#nothing more interesting on this line)
#.
#(?#start: look for index pattern)
#^E.{5}@.{9}Q.{30,}
#(?#end: index)
#(?#start: get the username which is surrounded by multiple dots,
minimum #of 2 in the begining and 0+ after)
#.{2,}(\w+)
#(?#end: username)
#}x
src = “12:23:59.378678 IP 85.225.108.54.54707 > 81.227.132.223.6112: P
590518027:590518071(44) ack 2582330461 win
64240\nE…[email protected]…U.l6Q…#2…<]P…,…wakko0…@…”
p re =~ src
p $~.to_a
Then you move the }x end of the regular expression and start
uncommenting further bits until it starts to fail again, then you know
where the problem is.