Problems with scRUBYt


#1

Hi.

I am currently scraping a page with scRUBYt and am not getting the
results as expected.

Instead of the correctly formatted xml document I’m getting the
following.

<1>book a <1>book b <1>book c <2>chapter aa <2>chapter bb <2>chapter cc <3>verse aaa <3>verse bbb <3>verse ccc

My code looks like this:

listing “//a[@id*=‘volume’>” do
book “//a[@class=‘1’]”
chapter “//span[@class=‘2’]”
verse “//a[@id*=‘3’]”
end

Any ideas?

Sorry for the sample data, but hopefully someone has seen this before
and can help.


#2

On Wed, Dec 31, 2008 at 08:33:26AM +0900, Cs Webgrl wrote:

<1>book b</1>
<1>book c</1>
<2>chapter aa</2>
<2>chapter bb</2>
<2>chapter cc</2>
<3>verse aaa</3>
<3>verse bbb</3>
<3>verse ccc</3>

This is a correctly formatted XML document. You just have numbers for
tag names.

My code looks like this:

listing “//a[@id*=‘volume’>” do
book “//a[@class=‘1’]”
chapter “//span[@class=‘2’]”
verse “//a[@id*=‘3’]”
end

Any ideas?

Have you tried something like this:

book “//2[@id=‘whatevs’]”

That should get you access to the tags.

Hope that helps!


#3

Aaron P. wrote:

Have you tried something like this:

book “//2[@id=‘whatevs’]”

That should get you access to the tags.

This gives me a ton of data, but now I have lost the specific pieces of
information that I’m looking for. Instead it looks like the output of
all of the sourced code on that page. Was I to change something else in
the code to get the specific piece of data that I need?