Parsing an apache access log line


#1

a have a line to parse…

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] “GET /star/images/main.gif
HTTP/1.1” 200 334

anyone has a better idea… got stuck coming up with one simple regex (
the double quote…) Need to tokenize the line,

token 1 = 10.88.90.75
token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = “GET /star/images/main.gif HTTP/1.1”
token 6 = 200
token 7 = 234

can some one help please.

Joe.


#2

2007/7/16, Joe N. removed_email_address@domain.invalid:

token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = “GET /star/images/main.gif HTTP/1.1”
token 6 = 200
token 7 = 234

can some one help please.

Try this as a starting point:

line.scan %r{
\S+
| [[^]]]
| “[^”]
"
}x

(untested)

Kind regards

robert


#3

hi joe!

Joe N. [2007-07-16 13:59]:

a have a line to parse…

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] “GET /star/images/main.gif
HTTP/1.1” 200 334
maybe you want to have a look at log_parser:
http://topfunky.net/svn/plugins/mint/lib/log_parser.rb

cheers
jens


#4

2007/7/16, Robert K. removed_email_address@domain.invalid:

token 1 = 10.88.90.75
Try this as a starting point:

line.scan %r{
\S+
| [[^]]]
| “[^”]
"
}x

(untested)

I think I got the order wrong. Rather do

line.scan %r{
[[^]]]
| “[^”]
"
| \S+
}x

Or do an explicit parse like the one Aur suggested.

Kind regards

robert


#5

/([0-9.]) (-) (-) ([.]) (".") ([0-9]) ([0-9]*)/ comes to mind,
although I’m probably wrong with the backslashes - some of the things
I escaped probably aren’t significant characters and some other ones
probably are.

Could you provide a test suite with more lines?

Hey, wouldn’t /(?^| )[^\S]|".")(?| )/, work to find each of the
tokens (that is, iterate it to find ALL matches)?

Aur


#6

Joe N. schrieb:

token 2 = -

line = “10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] “GET
star/images/main.gif HTTP/1.1” 200 334”
token =
/^(.?)\s+(.?)\s+(.?)\s+([.?])\s+(".*?")\s+(\d+)\s+(\d+)$/.match(line)

=> token[1] = 10.88.90.75
token[2] = -
token[3] = -
etc.

BR Phil