A method to search and cut in Ruby

unknown · July 26, 2007, 1:39am

Hello all,
I’m trying to parse some EXIF data and return certain fields of text.
If I wanted to parsed a document looking for text and then locating it
take everything after a colon : delimitator what is the best method in
ruby vs Shell command?

For example if I have a file called filename.txt and I want to search
for this —> “Compression : JPEG (old-style)”
but only return everything to the right of the delimitator?

Using command line tool such as grep to locate and then cut to grab
everything after " : " works but I’m trying to learn how to do things
like this in Ruby and having no luck grabbing everything to the right
of the colon. Some of the text I’m grabbing can contain additional
colons and I want everything after the first one.

Thanks.

Sc–

unknown · September 25, 2007, 11:02pm

On Jul 25, 7:05 pm, Phrogz [email protected] wrote:

=> [“Foo “, " Bar”]
irb(main):003:0> s.split(”:“).last
=> " Bar”

…or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> [“Foo”, “Bar”]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> “Bar”

I was looking at this method as well. I was having trouble though as I
know the left hand side column so for this example the Compression
portion then the colon : the remainder could be anything for instance
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep “Compression” | cut -d: -f2-
and that would grab everything…
So it sounds like I’m looking in the correct area but Im just not
pulling it all together. I will keep at it if you have additional info
to point to that would be great…and appreciated.
Thanks again.

Sc-

unknown · September 25, 2007, 11:03pm

On Jul 25, 4:59 pm, [email protected] wrote:

For example if I have a file called filename.txt and I want to search
for this —> “Compression : JPEG (old-style)”
but only return everything to the right of the delimitator?

C:>irb
irb(main):001:0> s = “Foo : Bar”
=> “Foo : Bar”
irb(main):002:0> s.split “:”
=> [“Foo “, " Bar”]
irb(main):003:0> s.split(”:").last
=> " Bar"

…or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> [“Foo”, “Bar”]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> “Bar”

unknown · September 25, 2007, 11:04pm

On Jul 25, 7:05 pm, Phrogz [email protected] wrote:

=> [“Foo “, " Bar”]
irb(main):003:0> s.split(”:“).last
=> " Bar”

…or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> [“Foo”, “Bar”]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> “Bar”

Some of the text I’m grabbing can contain additional
colons and I want everything after the first one.

SC
I am assuming you want everything after the first colon but not after
the second

irb(main):001:0> line = “Compression : JPEG (old_style) : whatever”
=> “Compression : JPEG (old_style) : whatever”
irb(main):002:0> if line =~ /^Compression\s*:?(.*)$/
irb(main):003:1> end_str = $1
irb(main):004:1> first_el = end_str.split(/:/).first
puts first_el
irb(main):005:1> end
JPEG (old_style)

if you everything after each colon

el = end_str.split(/:/)

p el = {“JPEG (old-style)”, “whatever”]

unknown · September 25, 2007, 11:04pm

[email protected] schrieb:

On Jul 25, 7:05 pm, Phrogz [email protected] wrote:

On Jul 25, 4:59 pm, [email protected] wrote:
…
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep “Compression” | cut -d: -f2-
and that would grab everything…

To get everything after the first colon you can use:
s =~ /^.*?
textAfterColon = $'.dup

To also get rid of the spaces directly after the first colon use this
Regex:
s =~ /^.?:\s/
textAfterColon = $'.dup

The trick is to anchor the regex (i.e. using ^)

BR Phil

unknown · September 25, 2007, 11:04pm

On Jul 26, 3:28 am, Phil M. [email protected] wrote:

=> [“Foo”, “Bar”]
To get everything after the first colon you can use:
s =~ /^.*?
textAfterColon = $'.dup

To also get rid of the spaces directly after the first colon use this Regex:
s =~ /^.?:\s/
textAfterColon = $'.dup

The trick is to anchor the regex (i.e. using ^)

BR Phil

Thanks so much for everyones answers much appreciated. The space after
the colon, I thought I was going to have to live with that. Thanks
again for the help.
Scott

unknown · September 25, 2007, 11:08pm

on Thu 26. July 2007 02.37, [email protected] wrote:

irb(main):002:0> s.split “:”

I was looking at this method as well. I was having trouble though as I
know the left hand side column so for this example the Compression
portion then the colon : the remainder could be anything for instance
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep “Compression” | cut -d: -f2-
and that would grab everything…
So it sounds like I’m looking in the correct area but Im just not
pulling it all together. I will keep at it if you have additional info
to point to that would be great…and appreciated.
Thanks again.

split takes an optional second parameter , defining how many
elements
the resuting array shall have at most

‘foo : bar : baz’.split(/\s*:\s*/, 2)
=> [ “foo”, “bar : baz” ]

-Thomas