Nokogiri not read html file in Cent OS 32-bit

Hi all,

I want to read and use html file content using Nokogiri in my cent os-32
bit

but it not read any html contents.

@test = Nokogiri::HTML(“abc.html”)
puts “#{@test}”

but it just shows me java scripts on source page not any html contents.

please reply me if any one know about this issue.

Thanks,
Priyank S.

On Nov 12, 2010, at 05:53 , Priyank S. wrote:

@test = Nokogiri::HTML(“abc.html”)

503 % ri Nokogiri.HTML
= Nokogiri.HTML

(from gem nokogiri-1.4.3.1)

HTML(thing, url = nil, encoding = nil, options =
XML::ParseOptions::DEFAULT_HTML, &block)

What Ryan is telling you: you have to pass a filepointer or the actual
HTML as string, not a string containing a filename.

Am 15.11.2010 um 06:55 schrieb Priyank S. [email protected]:

Ryan D. wrote in post #961099:

On Nov 12, 2010, at 05:53 , Priyank S. wrote:

@test = Nokogiri::HTML(“abc.html”)

503 % ri Nokogiri.HTML
= Nokogiri.HTML

(from gem nokogiri-1.4.3.1)

HTML(thing, url = nil, encoding = nil, options =
XML::ParseOptions::DEFAULT_HTML, &block)

hi,

Thanks for reply but not getting solution i get only

<!DOCTYPE html public \"-//W3C DTD HTML 4.0 Tansitional//EN\" ..... as a output, not actual html contents in file. I check nokogiri but i think it is some html character set encoding issue. can you give me some idea about this? Thanks, Priyank S.

Can’t reproduce your problem. Try this:

require ‘rubygems’
require ‘nokogiri’

make sure the file contains sth.

File.open(‘test.html’, ‘w’) {|f|
f.write(“

Foo

”) }

f = File.open(‘test.html’)
data = Nokogiri::HTML(f)
puts data
p data

----- OUTPUT ------

Foo

#<Nokogiri::HTML::Document:0x3ff244a4fb70 name="document" children=[#, #<Nokogiri::XML::Element:0x3ff244adf5b8 name="html" children=[#<Nokogiri::XML::Element:0x3ff244b50e0c name="body" children=[#<Nokogiri::XML::Element:0x3ff244b50b28 name="h1" children=[#]>]>]>]>

Florian G. wrote in post #961470:

What Ryan is telling you: you have to pass a filepointer or the actual
HTML as string, not a string containing a filename.

hi,

Thanks for explain but still i get the same problem

i us following in cent Os-5.5 32 bit

$> nokogiri -v

Ruby

engine:mri
version:1.8.7
platform:i686-linux

libxml:

loaded: 2.6.26
binding: extension
complied:2.6.26
nokogiri:1.4.3.1


my code is like

f = File.open(“test.html”)
data = Nokogiri::HTML(f)
puts “#{data}”

p “#{data}”

but any of this give

Output:

"<!DOCTYPE html PUBLIC “-W3C//DTD HTML 4.0 Transitional//EN” …

this type of output it shows not get actual html contents.

So help me if you have any more idea.

Thanks,
Priyank S.

Niklas Cathor wrote in post #961490:

Can’t reproduce your problem. Try this:

require ‘rubygems’
require ‘nokogiri’

make sure the file contains sth.

File.open(‘test.html’, ‘w’) {|f|
f.write(“

Foo

”) }

f = File.open(‘test.html’)
data = Nokogiri::HTML(f)
puts data
p data

----- OUTPUT ------

Foo

#<Nokogiri::HTML::Document:0x3ff244a4fb70 name="document" children=[#, #<Nokogiri::XML::Element:0x3ff244adf5b8 name="html" children=[#<Nokogiri::XML::Element:0x3ff244b50e0c name="body" children=[#<Nokogiri::XML::Element:0x3ff244b50b28 name="h1" children=[#]>]>]>]>

Hi,

First thanks to all for helping me in my problem.

I got the solution finally,

I tried

f = open(“test.html”).read
data = Nokogiri::HTML(f)
puts data
instead of

f = FIle.open(“test.html”)
data = Nokogiri::HTML(f)
puts data

and i get html data.

so basically i don’t use File class.

Thanks,
Priyank S.