How to sparse a string like: suppose input is str="<a>1</a><a>22</a><a>3</a>" I want str_a=["<a>1</a>","<a>22</a>","<a>3</a>"] How can I get str_a from str?
on 2012-10-01 23:32
on 2012-10-01 23:52
On 10/02/2012 10:32 AM, ajay paswan wrote: > How to sparse a string like: > > suppose input is str="<a>1</a><a>22</a><a>3</a>" > I want str_a=["<a>1</a>","<a>22</a>","<a>3</a>"] > How can I get str_a from str? > 1.9.3p125 :002 > str.scan /<a>\d+<\/a>/ => ["<a>1</a>", "<a>22</a>", "<a>3</a>"] Sam
on 2012-10-01 23:58
Sam Duncan wrote in post #1078251: > On 10/02/2012 10:32 AM, ajay paswan wrote: >> How to sparse a string like: >> >> suppose input is str="<a>1</a><a>22</a><a>3</a>" >> I want str_a=["<a>1</a>","<a>22</a>","<a>3</a>"] >> How can I get str_a from str? >> > 1.9.3p125 :002 > str.scan /<a>\d+<\/a>/ > => ["<a>1</a>", "<a>22</a>", "<a>3</a>"] > > Sam What if: str="<a>kl1</a><a>22ik</a><a>3o</a>" ?
on 2012-10-02 00:04
On 10/02/2012 10:58 AM, ajay paswan wrote: >> >> Sam > What if: str="<a>kl1</a><a>22ik</a><a>3o</a>" ? > 1.9.3p125 :002 > str.scan /<a>[[:alnum:]]+<\/a>/ => ["<a>kl1</a>", "<a>22ik</a>", "<a>3o</a>"] Ping pong I think perhaps you should read up on regular expressions in Ruby =] Sam
on 2012-10-02 00:11
Sam Duncan wrote in post #1078254: > On 10/02/2012 10:58 AM, ajay paswan wrote: >>> >>> Sam >> What if: str="<a>kl1</a><a>22ik</a><a>3o</a>" ? >> > 1.9.3p125 :002 > str.scan /<a>[[:alnum:]]+<\/a>/ > => ["<a>kl1</a>", "<a>22ik</a>", "<a>3o</a>"] > > Ping pong > > I think perhaps you should read up on regular expressions in Ruby =] > > Sam I have gone through it previously but could not figure it out, is '.' denotes any character?
on 2012-10-02 00:27
On 10/02/2012 11:11 AM, ajay paswan wrote: >> I think perhaps you should read up on regular expressions in Ruby =] >> >> Sam > I have gone through it previously but could not figure it out, is '.' > denotes any character? > There are at least three articles on the Internet about regular expressions. Have a hunt for one that makes sense to you specifically =] Sam
on 2012-10-02 23:04
On Mon, Oct 1, 2012 at 11:32 PM, ajay paswan <lists@ruby-forum.com> wrote: > How to sparse a string like: > > suppose input is str="<a>1</a><a>22</a><a>3</a>" > I want str_a=["<a>1</a>","<a>22</a>","<a>3</a>"] > How can I get str_a from str? I'd use a proper XML or HTML processing tool for that. Then you can search with XPath '//a'. $ irb19 -r nokogiri irb(main):001:0> str = "<a>1</a><a>22</a><a>3</a>" => "<a>1</a><a>22</a><a>3</a>" irb(main):004:0> dom = Nokogiri.HTML str => #<Nokogiri::HTML::Document:0x439c00c name="document" children=[#<Nokogiri::XML::DTD:0x439bdd2 name="html">, #<Nokogiri::XML::Element:0x439b940 name="html" children=[#<Nokogiri::XML::Element:0x439b80a name="body" children=[#<Nokogiri::XML::Element:0x439b6b6 name="a" children=[#<Nokogiri::XML::Text:0x439b53a "1">]>, #<Nokogiri::XML::Element:0x439b3be name="a" children=[#<Nokogiri::XML::Text:0x439b27e "22">]>, #<Nokogiri::XML::Element:0x439b152 name="a" children=[#<Nokogiri::XML::Text:0x439b030 "3">]>]>]>]> irb(main):005:0> str_a = dom.xpath '//a' => [#<Nokogiri::XML::Element:0x439b6b6 name="a" children=[#<Nokogiri::XML::Text:0x439b53a "1">]>, #<Nokogiri::XML::Element:0x439b3be name="a" children=[#<Nokogiri::XML::Text:0x439b27e "22">]>, #<Nokogiri::XML::Element:0x439b152 name="a" children=[#<Nokogiri::XML::Text:0x439b030 "3">]>] irb(main):006:0> str_a.size => 3 Kind regards robert5
on 2012-10-03 01:01
On Oct 1, 2012, at 15:11 , ajay paswan <lists@ruby-forum.com> wrote: >> >> I think perhaps you should read up on regular expressions in Ruby =] >> >> Sam > > I have gone through it previously but could not figure it out, is '.' > denotes any character? http://www.zenspider.com/Languages/Ruby/QuickRef.h...
on 2012-10-04 08:18
ajay paswan wrote in post #1078256: > Sam Duncan wrote in post #1078254: >> On 10/02/2012 10:58 AM, ajay paswan wrote: >>>> >>>> Sam >>> What if: str="<a>kl1</a><a>22ik</a><a>3o</a>" ? >>> >> 1.9.3p125 :002 > str.scan /<a>[[:alnum:]]+<\/a>/ >> => ["<a>kl1</a>", "<a>22ik</a>", "<a>3o</a>"] >> >> Ping pong >> >> I think perhaps you should read up on regular expressions in Ruby =] >> >> Sam > > I have gone through it previously but could not figure it out, is '.' > denotes any character? Yes, and .* means zero or more times of any character, so you might think of <a>.*</a> to match an open tag, followed by any text, followed by a closing tag. However this won't work the way you expect, because .* will match the largest amount of text it can while still matching the rest of the pattern. >> str="<a>kl1</a><a>22ik</a><a>3o</a>" => "<a>kl1</a><a>22ik</a><a>3o</a>" >> str.scan /<a>.*<\/a>/ => ["<a>kl1</a><a>22ik</a><a>3o</a>"] That is: the opening tag is <a>, the content is kl1</a><a>22ik</a><a>3o, and the closing tag is </a>. You probably hadn't thought of it like that :-) You can fix this using .*?, which will consume the smallest amount of text it can while still matching the rest of the pattern. >> str.scan /<a>.*?<\/a>/ => ["<a>kl1</a>", "<a>22ik</a>", "<a>3o</a>"] But as has been pointed out, regular expressions are not the right way to parse XML. Use a library specifically designed for XML parsing.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.