New to ruby

Hey guys, I’m pretty new to ruby. I’ve got a question
I want to parse a log file (standard apache log) and look for a couple
things.
1270.0.1 - - [13/Dec/2007:09:44:41 -0600] “GET /v700.php?
GN=MosherHB&UN=mosherhb&URL=http://photos-179.ll.facebook.com/photos-
ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif HTTP/1.1” 200 23
“-” “Filter7”

ok that’s a snippet from my log files. These requests are filtered
through a content filtering program. The GN= in the URL is the Account
name, UN= is the User name. I want to parse a log file, and find out
which user (in any group) has made the most requests. Then grab the
requests that user has made and place them in an array. Then time how
long it takes to get to that host (to test the filter load)

Can anyone please help me?

Joshua T. wrote:

Can anyone please help me?

First, welcome to ruby! Second, maybe something like this:

user_requests = Hash.new{0}
File.open(“my_log.txt”) do |line|
user = line.scan(/US=(\w+)/).flatten.first
user_requests[user] += 1
end

max_requests = user_requests.inject([0,nil]) do |max,user,requests|
if requests > max[0]
max = requests,user
end
max
end

puts “User #{max_requests[1] has #{max_requests[0] requests”

Drew O. wrote:

Joshua T. wrote:

Can anyone please help me?

First, welcome to ruby! Second, maybe something like this:

Whoops, that only gets you the max requests from the user, but it should
get you on the right track.

Drew O. wrote:

user = line.scan(/US=(\w+)/).flatten.first

user = line[/US=(\w+)/, 1]

No need for scan if you only care for the first match.

HTH,
Sebastian

Sebastian H. wrote:

Drew O. wrote:

user = line.scan(/US=(\w+)/).flatten.first

user = line[/US=(\w+)/, 1]

Sebastian -

Thanks for posting that. I’m always finding myself needing something
similar and had a feeling there was an easier syntax.

  • Drew

Drew O. wrote:

Sebastian H. wrote:

Drew O. wrote:

user = line.scan(/US=(\w+)/).flatten.first

user = line[/US=(\w+)/, 1]

Thanks for posting that. I’m always finding myself needing something
similar and had a feeling there was an easier syntax.

You’re welcome. There also Regexp#match if String#[] is not enough (e.g.
if
you need to access the values more than one capturing group), but you
still
only need one match:
md=“I have 100 dollars for you.”.match(/(\d+) (dollars|euros)/)
md[0] #=> “100 dollars”
md[1] #=> “100”
md[2] #=> “dollars”

You only need scan when a match can occur multiple times in a string.

HTH,
Sebastian

bigbrother wrote:

What does that mean?
It means that you’re calling scan on line when line is File object which
doesn’t have scan. If you change File.open to File.foreach, line will
actually be a line of the file (i.e. a string) and the code will work.

HTH,
Sebastian

On Dec 18, 2:59 pm, Sebastian H. [email protected]
wrote:

(…)
Jabber: [email protected]
ICQ: 205544826
I’m dumb or something’s not right
jthomas@jthomas-desktop:~/work$ ./test.rb
./test.rb:14: undefined method >' for nil:NilClass (NoMethodError) from ./test.rb:20:in inject’
from ./test.rb:13:in each' from ./test.rb:13:in inject’
from ./test.rb:13

Sorry guys, I’m really new to programming.

I’m dumb or something’s not right

I think you are the only one who insist on this :wink:

Sorry guys, I’m really new to programming.

I don’t think anyone gets mad about this (and if, then this guy has a
bad attitude), but at any rate you should consider posting your .rb file
too,
so others can see what you made wrong and correct it.

On Dec 18, 11:24 am, Drew O. [email protected] wrote:

Posted viahttp://www.ruby-forum.com/.
Cool, thanks. When I’m trying it though I get
Joshua$./test.rb
./test.rb:4: private method scan' called for #<File:access.log (closed)> (NoMethodError) from ./test.rb:3:in open’
from ./test.rb:3
(I use linux)
What does that mean?

Drew O. wrote:

Joshua T. wrote:

On Dec 18, 2:59 pm, Sebastian H. [email protected]
wrote:
Sorry guys, I’m really new to programming.

Ok, based on feedback from people in the thread and re-reading your

Ugh, US= should be UN=, sorry for the multiple emails.

-Drew

Joshua T. wrote:

On Dec 18, 2:59 pm, Sebastian H. [email protected]
wrote:
Sorry guys, I’m really new to programming.

Ok, based on feedback from people in the thread and re-reading your
initial explanation, I think this is what you want:

user_requests = Hash.new{[]}

File.foreach(“my_log.txt”) do |line|
user = line[/US=(\w+)/,1]
user_requests[user] += line
end

max_user_info = user_requests.max{|a,b| a[1].size <=> b[1].size}

puts “User name: #{max_user_info[0]}”
puts “Requests:”
max_user_info[1].each do |request|
puts request
end

On Dec 18, 5:06 pm, Drew O. [email protected] wrote:

-Drew


Posted viahttp://www.ruby-forum.com/.

Thanks for the help. Can you answer a few questions though?
ok when it prints out the user requests, is there a way I can use
regex to get just a URL? That’s really what I need in that array.
Thanks for all your help. I’ve got a book on ruby and will start
reading it when I get the chance

On Dec 20, 11:09 pm, bigbrother [email protected] wrote:

Ugh, US= should be UN=, sorry for the multiple emails.
reading it when I get the chance
Guess I should clarify, the first bit of regex works fine, gets the
users and selects the user with the most requests. But when it prints
out the requests it does this
User name: tendercarepro
Requests: 70.xxx.xxx.xxx- - [13/Dec/2007:09:44:39 -0600] “GET /
[email protected]&UN=tendercarepro&URL=http://
s.wsj.net/public/resources/images/OB-AV076_Umami_20071205151444.jpg
HTTP/1.1” 200 17 “-” “Filter7”
I want to strip out everything but the base URL, in this case
http://s.wsj.net/public/resources/images/OB-AV076_Umami_20071205151444.jpg
(taken from my firewall logs)
sorry if I’m annoying you guys, you’ve been a huge help

On Dec 21, 2007 1:24 PM, bigbrother [email protected] wrote:

I want to strip out everything but the base URL

hint:

botp@it:~$ irb
irb(main):001:0> line=%q(1270.0.1 - - [13/Dec/2007:09:44:41 -0600]
“GET
/v700.php?GN=MosherHB&UN=mosherhb&URL=http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif
HTTP/1.1” 200 23 “-” “Filter7”)
=> “1270.0.1 - - [13/Dec/2007:09:44:41 -0600] "GET
/v700.php?GN=MosherHB&UN=mosherhb&URL=http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif
HTTP/1.1" 200 23 "-" "Filter7"”

irb(main):002:0> user,url =
line.match(/UN=(\w+)&URL=(.+)\sHTTP/).captures
=> [“mosherhb”,
http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif”]

irb(main):003:0> p user,url
“mosherhb”
http://photos-179.ll.facebook.com/photos-ll-sctm/v43/151/5757353179/app_2_5757353179_5703.gif
=> nil

kind regards -botp