Help parsing and extracting a num

Hi, I’m trying to extract a num from a string. This string is in fact
some number of headers separated by \r\n or \n. for now I just need to
extract the value of “Content-Length” (case insensitive) header.

For example:

irb> headers = “Via: via1,via2,\r\n via3\r\nFrom:
from_user@domain\nContent-length: 22\r\nTo: to_user@domain\n”

I do the following:

irb> headers.scan(/content-length\s*:\s*([0-9]*)/i).to_s.to_i
=> 22

In case the “Content-Length” header exist but has empty value I get 0
with the above code (that’s what I need in fact).

Is there any way faster to do the same? any help?

Thanks a lot.

2008/3/25, Iñaki Baz C. [email protected]:

Hi, I’m trying to extract a num from a string. This string is in fact
some number of headers separated by \r\n or \n. for now I just need to
extract the value of “Content-Length” (case insensitive) header.

Why don’t you use Net::Http? IIRC this gives you individual header
fields already.

In case the “Content-Length” header exist but has empty value I get 0
with the above code (that’s what I need in fact).

Is there any way faster to do the same? any help?

headers[/content-length:\s*(\d+)/i, 1].to_i

Cheers

robert

2008/3/25, Robert K. [email protected]:

2008/3/25, Iñaki Baz C. [email protected]:

Hi, I’m trying to extract a num from a string. This string is in fact
some number of headers separated by \r\n or \n. for now I just need to
extract the value of “Content-Length” (case insensitive) header.

Why don’t you use Net::Http? IIRC this gives you individual header
fields already.

Yes, but I’m doing a parser for SIP protocol (similar to HTTP but not
the same).

headers[/content-length:\s*(\d+)/i, 1].to_i

Great !

Thanks.

2008/3/25, Iñaki Baz C. [email protected]:

"Via: via1,via2,\r\n via3\r\nFrom: from_user@domain\nContent-length:

\r\n1234: blabla\nTo: to_user@domain\n"[/content-length\s*:\s*(\d+)/i,
1].to_i
=> 1234

I’m trying to solve it by ading [^(\r|\n)] after (\d+) but I get
nothing. Please, could you help me with this?

I think I have now the solution. The problem is that \s matches also
\n so what I do now is:

[/content-length\s*:[ \t]*(\d+)/i, 1].to_i

(I allow any space or tab before the expected numbers but no \r or \n).

2008/3/25, Robert K. [email protected]:

headers[/content-length:\s*(\d+)/i, 1].to_i

There is a small issue here that I can’t solve. Imagine there is
“Content-Length” header bt with empty value, and the next header has a
numeric header name:


Content-Length:
1234: blablabla

In that case the code above returns 1234 while it should return 0.

Demostration:

“Via: via1,via2,\r\n via3\r\nFrom: from_user@domain\nContent-length:
\r\n1234: blabla\nTo: to_user@domain\n”[/content-length\s*:\s*(\d+)/i,
1].to_i
=> 1234

I’m trying to solve it by ading [^(\r|\n)] after (\d+) but I get
nothing. Please, could you help me with this?

Thanks a lot in advance and best regards.