Sparse xml string

How to sparse a string like:

suppose input is str=“1223
I want str_a=[“1”,“22”,“3”]
How can I get str_a from str?

On 10/02/2012 10:32 AM, ajay paswan wrote:

How to sparse a string like:

suppose input is str=“1223
I want str_a=[“1”,“22”,“3”]
How can I get str_a from str?

1.9.3p125 :002 > str.scan /\d+</a>/
=> [“
1”, “22”, “3”]


Sam D. wrote in post #1078251:

On 10/02/2012 10:32 AM, ajay paswan wrote:

How to sparse a string like:

suppose input is str=“1223
I want str_a=[“1”,“22”,“3”]
How can I get str_a from str?

1.9.3p125 :002 > str.scan /\d+</a>/
=> [“
1”, “22”, “3”]


What if: str=“kl122ik3o” ?

On 10/02/2012 10:58 AM, ajay paswan wrote:

What if: str=“kl122ik3o” ?

1.9.3p125 :002 > str.scan /[[:alnum:]]+</a>/
=> [“
kl1”, “22ik”, “3o”]

Ping pong

I think perhaps you should read up on regular expressions in Ruby =]


Sam D. wrote in post #1078254:

On 10/02/2012 10:58 AM, ajay paswan wrote:

What if: str=“kl122ik3o” ?

1.9.3p125 :002 > str.scan /[[:alnum:]]+</a>/
=> [“
kl1”, “22ik”, “3o”]

Ping pong

I think perhaps you should read up on regular expressions in Ruby =]


I have gone through it previously but could not figure it out, is ‘.’
denotes any character?

On 10/02/2012 11:11 AM, ajay paswan wrote:

I think perhaps you should read up on regular expressions in Ruby =]

I have gone through it previously but could not figure it out, is ‘.’
denotes any character?

There are at least three articles on the Internet about regular
expressions. Have a hunt for one that makes sense to you specifically =]


On Mon, Oct 1, 2012 at 11:32 PM, ajay paswan [email protected]

How to sparse a string like:

suppose input is str=“1223
I want str_a=[“1”,“22”,“3”]
How can I get str_a from str?

I’d use a proper XML or HTML processing tool for that. Then you can
search with XPath ‘//a’.

$ irb19 -r nokogiri
irb(main):001:0> str = “1223
=> “1223
irb(main):004:0> dom = Nokogiri.HTML str
=> #<Nokogiri::HTML::Document:0x439c00c name=“document”
children=[#<Nokogiri::XML::DTD:0x439bdd2 name=“html”>,
#<Nokogiri::XML::Element:0x439b940 name=“html”
children=[#<Nokogiri::XML::Element:0x439b80a name=“body”
children=[#<Nokogiri::XML::Element:0x439b6b6 name=“a”
children=[#<Nokogiri::XML::Text:0x439b53a “1”>]>,
#<Nokogiri::XML::Element:0x439b3be name=“a”
children=[#<Nokogiri::XML::Text:0x439b27e “22”>]>,
#<Nokogiri::XML::Element:0x439b152 name=“a”
children=[#<Nokogiri::XML::Text:0x439b030 “3”>]>]>]>]>
irb(main):005:0> str_a = dom.xpath ‘//a’
=> [#<Nokogiri::XML::Element:0x439b6b6 name=“a”
children=[#<Nokogiri::XML::Text:0x439b53a “1”>]>,
#<Nokogiri::XML::Element:0x439b3be name=“a”
children=[#<Nokogiri::XML::Text:0x439b27e “22”>]>,
#<Nokogiri::XML::Element:0x439b152 name=“a”
children=[#<Nokogiri::XML::Text:0x439b030 “3”>]>]
irb(main):006:0> str_a.size
=> 3

Kind regards


On Oct 1, 2012, at 15:11 , ajay paswan [email protected] wrote:

I think perhaps you should read up on regular expressions in Ruby =]


I have gone through it previously but could not figure it out, is ‘.’
denotes any character?

ajay paswan wrote in post #1078256:

Sam D. wrote in post #1078254:

On 10/02/2012 10:58 AM, ajay paswan wrote:

What if: str=“kl122ik3o” ?

1.9.3p125 :002 > str.scan /[[:alnum:]]+</a>/
=> [“
kl1”, “22ik”, “3o”]

Ping pong

I think perhaps you should read up on regular expressions in Ruby =]


I have gone through it previously but could not figure it out, is ‘.’
denotes any character?

Yes, and .* means zero or more times of any character, so you might
think of .* to match an open tag, followed by any text, followed
by a closing tag.

However this won’t work the way you expect, because .* will match the
largest amount of text it can while still matching the rest of the

=> “kl122ik3o

str.scan /.*</a>/
=> [“

That is: the opening tag is , the content is kl122ik3o,
and the closing tag is
. You probably hadn’t thought of it like that

You can fix this using .*?, which will consume the smallest amount of
text it can while still matching the rest of the pattern.

str.scan /.*?</a>/
=> [“
kl1”, “22ik”, “3o”]

But as has been pointed out, regular expressions are not the right way
to parse XML. Use a library specifically designed for XML parsing.