Select tr>3 with nokogiri

i want to get row which it contains more than 3 columns
how to write xpath with nokogiri

require ‘rubygems’
require ‘nokogiri’
item=‘sometext’
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath(’/html/body/table/tr[@td.size>3]’)
puts data
it can not run , help and advices appreciated.

for example,
table1:

kk
1 2
3 4
qq

table2:

kk
1 2
3 4

i want to get table2 from table1,to get row which contains more then
one column,how to do it with nokogiri??

Use count(), like:

document.xpath("//*[count(td)=2]")

You can also select children at certain offsets with td:nth-child(N)
or position(N)

HTH,
Ammar

p1
data=doc.xpath(’/table/tr/*[count(td)>1]’)
puts data
p2
data=doc.xpath(’/table/tr/td[count(td)>1]’)
puts data
none of them is right,why can i get nothing?

If the table is not the root or directly inside the root, you need 2
“/” in the beginning. The count function applies to the tr, not the
td, so you don’t need the “*” in p1, or the td in p2. Try this:

doc.xpath(’//table/tr[count(td)>1]’)

Good Luck,
Ammar

document.xpath("//[count(td)=2]") is right,but i want to know
p1
data=doc.xpath(’/table/tr/
[count(td)>1]’)
puts data
p2
data=doc.xpath(’/table/tr/td[count(td)>1]’)
puts data
how to fix p1\p2?

think Ammar ,one problem vanish,another occur.
here is the content of /home/pt/mytest:

reportdate 10/31/09 10/31/08 10/31/07 10/31/06 10/31/05
Cash & Equivalents 2,493 1,429 1,826 2,262 2,251
Receivables 595 770 735 692 753
Notes Receivable 0 0 0 0 0
Inventories 552 646 643 627 722

what i want to get is :

p1:
require ‘rubygems’
require ‘nokogiri’
doc = Nokogiri::HTML.parse(open(’/home/pt/mytest’))
result=doc.xpath(’//table/tr[td[@class=“ticker”]]’)
puts result

i can get what i want with p1

p2:
require ‘rubygems’
require ‘nokogiri’
doc = Nokogiri::HTML.parse(open(’/home/pt/mytest’))
result=doc.xpath(’//table/tr[td[not(@class=“tickerSm”)]]’)
puts result

why can’t i get what i want with p2??
how to fix p2?
think for your help.

Receivables 595 770 735 692 753
Notes Receivable 0 0 0 0 0
Inventories 552 646 643 627 722

i found some secret,if my file /home/pt/mytest was changed into:

reportdate 10/31/09 10/31/08 10/31/07 10/31/06 10/31/05
Cash & Equivalents 2,493 1,429 1,826 2,262 2,251
Receivables 595 770 735 692 753
Notes Receivable 0 0 0 0 0
Inventories 552 646 643 627 722

with the code ,
require ‘rubygems’
require ‘nokogiri’
doc = Nokogiri::HTML.parse(open(’/home/pt/mytest’))
result=doc.xpath(’//table/tr[*[not(@class=“tickerSm”)]]’)
puts result

what i can get is:

the row can not be selected by my code,

but how to delete row with xpath?

it can't work : xpath('//table/tr[*[not(@class="tickerSm")]]') maybe the reason is : some class of td is "ticker",another is "tickerSm", if i don't want to select it with xpath,how to express it with xpath??
Cash & Equivalents 2,493 1,429 1,826 2,262 2,251
Receivables 595 770 735 692 753
Notes Receivable 0 0 0 0 0
Inventories 552 646 643 627 722
reportdate 10/31/09 10/31/08 10/31/07 10/31/06 10/31/05
Cash & Equivalents 2,493 1,429 1,826 2,262 2,251

On Fri, 27 Aug 2010 23:26:53 +0900, Pen T. wrote:

i want to get row which it contains more than 3 columns how to write
xpath with nokogiri

require ‘rubygems’
require ‘nokogiri’
item=‘sometext’
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath(’/html/body/table/tr[@td.size>3]’) puts data
it can not run , help and advices appreciated.

doc.xpath(’/html/body/table/tr[count(td)>3]’)

i found they are equal between not and != in nokogiri xpath
expression.
there is still one problem remain,if my html is the following:

reportdate 10/31/09 10/31/08 10/31/07 10/31/06 10/31/05
Cash & Equivalents 2,493 1,429 1,826 2,262 2,251
Receivables 595 770 735 692 753

xpath(’//table/tr[td[@class=“tickerSm”]’) get :

reportdate 10/31/09 10/31/08 10/31/07 10/31/06 10/31/05

xpath(’//table/tr[td[@class=“ticker”]’) get :

Receivables 595 770 735 692 753

but how can i get the following with xpath expression?

Cash & Equivalents 2,493 1,429 1,826 2,262 2,251

xpath(’//table/tr[*[not(@class=“tickerSm”)]]’)
maybe the reason is : some class of td is “ticker”,another is
“tickerSm”,
if i don’t want to  select it with xpath,how to express it with xpath??

Hi Pen,

I don’t know if “not” is valid like that, I have to double check. But
you can use “!=” with attributes.

doc.xpath(’//table/tr/*[@class!=“tickerSm”]’)

I hope it helps,
Ammar

a friend tell me,
//table/tr[td[1][@class=“tickerSm”] and td[2][@class=“ticker”]]
it is ok

On Sun, Aug 29, 2010 at 9:40 AM, Pen T. [email protected] wrote:

a friend tell me,
//table/tr[td[1][@class=“tickerSm”] and td[2][@class=“ticker”]]
it is ok

That’s good. Another possible approach is using following-sibling, if
you don’t want the first td[@class=“tickerSm”]

//table/tr/td[1][@class=“tickerSm”]/following-sibling::td[@class!=“tickerSm”]

Ammar