Hi.
I am currently scraping a page with scRUBYt and am not getting the
results as expected.
Instead of the correctly formatted xml document I’m getting the
following.
<1>book a
<1>book b
<1>book c
<2>chapter aa
<2>chapter bb
<2>chapter cc
<3>verse aaa
<3>verse bbb
<3>verse ccc
My code looks like this:
listing “//a[@id*=‘volume’>” do
book “//a[@class=‘1’]”
chapter “//span[@class=‘2’]”
verse “//a[@id*=‘3’]”
end
Any ideas?
Sorry for the sample data, but hopefully someone has seen this before
and can help.
On Wed, Dec 31, 2008 at 08:33:26AM +0900, Cs Webgrl wrote:
<1>book b</1>
<1>book c</1>
<2>chapter aa</2>
<2>chapter bb</2>
<2>chapter cc</2>
<3>verse aaa</3>
<3>verse bbb</3>
<3>verse ccc</3>
This is a correctly formatted XML document. You just have numbers for
tag names.
My code looks like this:
listing “//a[@id*=‘volume’>” do
book “//a[@class=‘1’]”
chapter “//span[@class=‘2’]”
verse “//a[@id*=‘3’]”
end
Any ideas?
Have you tried something like this:
book “//2[@id=‘whatevs’]”
That should get you access to the tags.
Hope that helps!
Aaron P. wrote:
Have you tried something like this:
book “//2[@id=‘whatevs’]”
That should get you access to the tags.
This gives me a ton of data, but now I have lost the specific pieces of
information that I’m looking for. Instead it looks like the output of
all of the sourced code on that page. Was I to change something else in
the code to get the specific piece of data that I need?