I have a nice little regex to pull the information rich guts from a
table…
%r{</thead.?>(.?)}m =~html
$1 now contains all the rows of the table as one long string.
I’d like to turn that into an array of rows, but I am not exactly sure
how.
Additionally, I’d like to process the rows so that i can get data from
between the nth
| pair.
Any help?
On Fri, Nov 7, 2008 at 3:08 PM, soldier.coder
[email protected] wrote:
between the nth pair.
Any help?
If you have a string with a repeating pattern that you want an array
of, String#scan is your man.
irb(main):001:0> html = “foobar”
=> “foobar”
irb(main):002:0> a = html.scan(/(.+?)</td>/)
=> [[“foo”], [“bar”]]
Hmmm, that’s sort of ugly.
irb(main):003:0> a = html.scan(/(.+?)</td>/).flatten
=> [“foo”, “bar”]
Much better.
Ad hoc regexes are fine for quick-n-dirty scripting. But if you’re
serious about parsing HTML you might want to look into Hpricot or
Nokogiri.
-Michael L.