Parsing text

dubstep · April 22, 2011, 3:43am

Hey all,

I have a file where I need to parse information from. The format of the
first line is something like this:

“>ruby ruby |ruby|ruby ruby|text_i_want| test test”

I was thinking converting this line into an array, using the .split(//)
and keeping count of the pipe("|") character so that when it reaches the
3rd one, it reads the characters up till the 4th pipe(all in a do
iterator. So in essence, I would want to extract “text_i_want”. When i
tried this method, I got stuck. Any ideas on how to move forward? Or an
easier solution than this? Thanks!

playballa23 · April 22, 2011, 4:08am

Good Afternoon,

On Thu, Apr 21, 2011 at 6:43 PM, Cyril J. [email protected]
wrote:

Hey all,

I have a file where I need to parse information from. The format of the
first line is something like this:

“>ruby ruby |ruby|ruby ruby|text_i_want| test test”

I was thinking converting this line into an array, using the .split(//)

You got close - this should work for you

.split(/|/)[3]

That will return the 4th group of text for you

John

playballa23 · April 22, 2011, 5:15am

Thanks John and 7stud - I have a better understanding now.

playballa23 · April 22, 2011, 4:20am

A pipe is one of the special regex characters–it does not stand for a
literal pipe. A pipe is used in a regex to mean ‘OR’.

There several other ways to escape the special regex characters, so that
they will lose their special meaning and match themselves:

You can use a backslash to escape the pipe.
You can put the pipe in a character class:

str = “>ruby ruby |ruby|ruby ruby|text_i_want| test test”

pieces = str.split(/[|]/)
puts pieces[3]

–output:–
text_i_want

Inside a character class ([]), the special regex characters lose their
special meaning.

You can call Regexp.escape to escape any special regex characters
contained in the string, and the special characters will lose their
special meaning:

str = “>ruby ruby |ruby|ruby ruby|text_i_want| test test”

pattern = “|”
esc_str = Regexp.escape(pattern)

pieces = str.split(/#{esc_str}/)
puts pieces[3]

–output:–
text_i_want