i want to get row which it contains more than 3 columns
how to write xpath with nokogiri
require ‘rubygems’
require ‘nokogiri’
item=‘sometext’
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath(’/html/body/table/tr[@td.size>3]’)
puts data
it can not run , help and advices appreciated.
for example,
table1:
table2:
i want to get table2 from table1,to get row which contains more then
one column,how to do it with nokogiri??
Use count(), like:
document.xpath("//*[count(td)=2]")
You can also select children at certain offsets with td:nth-child(N)
or position(N)
HTH,
Ammar
p1
data=doc.xpath(’/table/tr/*[count(td)>1]’)
puts data
p2
data=doc.xpath(’/table/tr/td[count(td)>1]’)
puts data
none of them is right,why can i get nothing?
If the table is not the root or directly inside the root, you need 2
“/” in the beginning. The count function applies to the tr, not the
td, so you don’t need the “*” in p1, or the td in p2. Try this:
doc.xpath(’//table/tr[count(td)>1]’)
Good Luck,
Ammar
document.xpath("//[count(td)=2]") is right,but i want to know
p1
data=doc.xpath(’/table/tr/[count(td)>1]’)
puts data
p2
data=doc.xpath(’/table/tr/td[count(td)>1]’)
puts data
how to fix p1\p2?
think Ammar ,one problem vanish,another occur.
here is the content of /home/pt/mytest:
reportdate |
10/31/09 |
10/31/08 |
10/31/07 |
10/31/06 |
10/31/05 |
Cash & Equivalents |
2,493 |
1,429 |
1,826 |
2,262 |
2,251 |
Receivables |
595 |
770 |
735 |
692 |
753 |
Notes Receivable |
0 |
0 |
0 |
0 |
0 |
Inventories |
552 |
646 |
643 |
627 |
722 |
what i want to get is :
Receivables |
595 |
770 |
735 |
692 |
753 |
Notes Receivable |
0 |
0 |
0 |
0 |
0 |
Inventories |
552 |
646 |
643 |
627 |
722 |
p1:
require ‘rubygems’
require ‘nokogiri’
doc = Nokogiri::HTML.parse(open(’/home/pt/mytest’))
result=doc.xpath(’//table/tr[td[@class=“ticker”]]’)
puts result
i can get what i want with p1
p2:
require ‘rubygems’
require ‘nokogiri’
doc = Nokogiri::HTML.parse(open(’/home/pt/mytest’))
result=doc.xpath(’//table/tr[td[not(@class=“tickerSm”)]]’)
puts result
why can’t i get what i want with p2??
how to fix p2?
think for your help.
i found some secret,if my file /home/pt/mytest was changed into:
reportdate |
10/31/09 |
10/31/08 |
10/31/07 |
10/31/06 |
10/31/05 |
Cash & Equivalents |
2,493 |
1,429 |
1,826 |
2,262 |
2,251 |
Receivables |
595 |
770 |
735 |
692 |
753 |
Notes Receivable |
0 |
0 |
0 |
0 |
0 |
Inventories |
552 |
646 |
643 |
627 |
722 |
with the code ,
require ‘rubygems’
require ‘nokogiri’
doc = Nokogiri::HTML.parse(open(’/home/pt/mytest’))
result=doc.xpath(’//table/tr[*[not(@class=“tickerSm”)]]’)
puts result
what i can get is:
Cash & Equivalents |
2,493 |
1,429 |
1,826 |
2,262 |
2,251 |
Receivables |
595 |
770 |
735 |
692 |
753 |
Notes Receivable |
0 |
0 |
0 |
0 |
0 |
Inventories |
552 |
646 |
643 |
627 |
722 |
the row can not be selected by my code,
reportdate |
10/31/09 |
10/31/08 |
10/31/07 |
10/31/06 |
10/31/05 |
but how to delete row with xpath?
Cash & Equivalents |
2,493 |
1,429 |
1,826 |
2,262 |
2,251 |
it can't work :
xpath('//table/tr[*[not(@class="tickerSm")]]')
maybe the reason is : some class of td is "ticker",another is
"tickerSm",
if i don't want to select it with xpath,how to express it with xpath??
On Fri, 27 Aug 2010 23:26:53 +0900, Pen T. wrote:
i want to get row which it contains more than 3 columns how to write
xpath with nokogiri
require ‘rubygems’
require ‘nokogiri’
item=‘sometext’
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath(’/html/body/table/tr[@td.size>3]’) puts data
it can not run , help and advices appreciated.
doc.xpath(’/html/body/table/tr[count(td)>3]’)
i found they are equal between not and != in nokogiri xpath
expression.
there is still one problem remain,if my html is the following:
reportdate |
10/31/09 |
10/31/08 |
10/31/07 |
10/31/06 |
10/31/05 |
Cash & Equivalents |
2,493 |
1,429 |
1,826 |
2,262 |
2,251 |
Receivables |
595 |
770 |
735 |
692 |
753 |
xpath(’//table/tr[td[@class=“tickerSm”]’) get :
reportdate |
10/31/09 |
10/31/08 |
10/31/07 |
10/31/06 |
10/31/05 |
xpath(’//table/tr[td[@class=“ticker”]’) get :
Receivables |
595 |
770 |
735 |
692 |
753 |
but how can i get the following with xpath expression?
Cash & Equivalents |
2,493 |
1,429 |
1,826 |
2,262 |
2,251 |
xpath(’//table/tr[*[not(@class=“tickerSm”)]]’)
maybe the reason is : some class of td is “ticker”,another is
“tickerSm”,
if i don’t want to  select it with xpath,how to express it with xpath??
Hi Pen,
I don’t know if “not” is valid like that, I have to double check. But
you can use “!=” with attributes.
doc.xpath(’//table/tr/*[@class!=“tickerSm”]’)
I hope it helps,
Ammar
a friend tell me,
//table/tr[td[1][@class=“tickerSm”] and td[2][@class=“ticker”]]
it is ok
On Sun, Aug 29, 2010 at 9:40 AM, Pen T. [email protected] wrote:
a friend tell me,
//table/tr[td[1][@class=“tickerSm”] and td[2][@class=“ticker”]]
it is ok
That’s good. Another possible approach is using following-sibling, if
you don’t want the first td[@class=“tickerSm”]
//table/tr/td[1][@class=“tickerSm”]/following-sibling::td[@class!=“tickerSm”]
Ammar