Forum: Ruby Parsing an apache access log line

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Joe N. (Guest)
on 2007-07-16 15:59
a have a line to parse....

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
HTTP/1.1" 200 334


anyone has a better idea... got stuck coming up with one simple regex (
the double quote...) Need to tokenize the line,

token 1 = 10.88.90.75
token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = "GET /star/images/main.gif HTTP/1.1"
token 6 = 200
token 7 = 234


can some one help please.

Joe.
Robert K. (Guest)
on 2007-07-16 16:11
(Received via mailing list)
2007/7/16, Joe N. <removed_email_address@domain.invalid>:
> token 2 = -
> token 3 = -
> token 4 = [16/Jul/2007:07:46:09 -0400]
> token 5 = "GET /star/images/main.gif HTTP/1.1"
> token 6 = 200
> token 7 = 234
>
>
> can some one help please.

Try this as a starting point:

line.scan %r{
  \S+
| \[[^\]]*\]
| "[^"]*"
}x

(untested)

Kind regards

robert
jens wille (Guest)
on 2007-07-16 16:13
(Received via mailing list)
hi joe!

Joe N. [2007-07-16 13:59]:
> a have a line to parse....
>
> 10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
> HTTP/1.1" 200 334
maybe you want to have a look at log_parser:
<http://topfunky.net/svn/plugins/mint/lib/log_parser.rb>

cheers
jens
SonOfLilit (Guest)
on 2007-07-16 16:16
(Received via mailing list)
/([0-9.]*) (-) (-) (\[.*\]) (\".*\") ([0-9]*) ([0-9]*)/ comes to mind,
although I'm probably wrong with the backslashes - some of the things
I escaped probably aren't significant characters and some other ones
probably are.

Could you provide a test suite with more lines?

Hey, wouldn't /(?^| )[^\S]*|\".*\")(?| )/, work to find each of the
tokens (that is, iterate it to find ALL matches)?


Aur
Robert K. (Guest)
on 2007-07-16 16:54
(Received via mailing list)
2007/7/16, Robert K. <removed_email_address@domain.invalid>:
> > token 1 = 10.88.90.75
> Try this as a starting point:
>
> line.scan %r{
>   \S+
> | \[[^\]]*\]
> | "[^"]*"
> }x
>
> (untested)

I think I got the order wrong.  Rather do

line.scan %r{
  \[[^\]]*\]
| "[^"]*"
| \S+
}x

Or do an explicit parse like the one Aur suggested.

Kind regards

robert
Phil M. (Guest)
on 2007-07-16 19:00
(Received via mailing list)
Joe N. schrieb:
> token 2 = -
>
line = "10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] \"GET
star/images/main.gif HTTP/1.1\" 200 334"
token =
/^(.*?)\s+(.*?)\s+(.*?)\s+(\[.*?\])\s+(\".*?\")\s+(\d+)\s+(\d+)$/.match(line)

=> token[1] = 10.88.90.75
    token[2] = -
    token[3] = -
    etc.

BR Phil
This topic is locked and can not be replied to.