Forum: Ruby on Rails urgent please :HTTPS html parsing

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
19eb75164135659a8fae98101b1c250e?d=identicon&s=25 Arun Kumar (arun_nss)
on 2009-03-17 13:05
Attachment: demo.rb (3 KB)
Hi,
Is there any way to extract the html code of a https:// website in
hpricot. When i use hpricot to access a https:// website i receive the
following error.

/usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:31:in
`gem_original_require': no such file to load -- net/https (LoadError)
  from /usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:31:in
`require'
  from /usr/lib/ruby/1.8/open-uri.rb:230:in `open_http'
  from /usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
  from /usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
  from /usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
  from /usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
  from /usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
  from /usr/lib/ruby/1.8/open-uri.rb:518:in `open'
  from /usr/lib/ruby/1.8/open-uri.rb:30:in `open'
  from demo.rb:15:in `valid'
  from demo.rb:93


I'm also not able to load the html data gmail, youtube etc. Is it
because i'm using hpricot. Is there any other way to extract https
websites. Please help me.

Regards
Arun Kumar
2101d75ccd71c5dfb984991a7ba53b9b?d=identicon&s=25 Harm (Guest)
on 2009-03-17 13:20
(Received via mailing list)
Did you require 'net/https'? It seems that that lib is just not loaded/
present.

On Mar 17, 1:05 pm, Arun Kumar <rails-mailing-l...@andreas-s.net>
19eb75164135659a8fae98101b1c250e?d=identicon&s=25 Arun Kumar (arun_nss)
on 2009-03-17 13:23
Harm wrote:
> Did you require 'net/https'? It seems that that lib is just not loaded/
> present.
>
> On Mar 17, 1:05�pm, Arun Kumar <rails-mailing-l...@andreas-s.net>

Can u please explain about how to include  'net/https'.

Thanks a lot
D188e591eac11021329b8821a5f954c7?d=identicon&s=25 Ar Chron (railsdog)
on 2009-03-17 14:27
-1 for effort on the part of the poster...

Please go read
http://www.ruby-doc.org/stdlib/libdoc/net/http/rdo...

and learn about what you are trying to use
19eb75164135659a8fae98101b1c250e?d=identicon&s=25 Arun Kumar (arun_nss)
on 2009-03-17 15:11
Ar Chron wrote:
> -1 for effort on the part of the poster...
>
> Please go read
> http://www.ruby-doc.org/stdlib/libdoc/net/http/rdo...
>
> and learn about what you are trying to use

I learned about 'net/http' and 'hpricot. but it is showing the same
error even for youtube. The code snippet i used for url extraction is:

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'dbi'

class Url
    def valid
        begin
        puts "Enter domain name :"
        domain = gets.chomp
        #concatinating 'http://www.' with the url to open the page
        url = "http://#{domain}"
        document = open(url,"User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0; .NET CLR 1.0.3705)")
        #getting the original url of the site
        realUrl = document.base_uri.to_s
        rescue
            puts "Unable to open the URL. Please check if you have
entered a valid URL."
        end
        parms = Array.new
        parms = [domain, realUrl]
end

I'm able to extract the data from every site except
'http://www.youtube.com' and 'gmail.com' and other 'https' sites'.
Please help. I'll be really thankful

Regards
Arun Kumar
D188e591eac11021329b8821a5f954c7?d=identicon&s=25 Ar Chron (railsdog)
on 2009-03-17 15:37
> Please help. I'll be really thankful
>
> Regards
> Arun Kumar

A quick Google search for 'rails https' yielded this on the 5th entry
found. It seems like exactly what you need to do.

http://railsruby.blogspot.com/2006/02/https-open-u...
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2009-03-17 15:54
(Received via mailing list)
On Mar 17, 2:37 pm, Ar Chron <rails-mailing-l...@andreas-s.net> wrote:
> > Please help. I'll be really thankful
>
> > Regards
> > Arun Kumar
>
> A quick Google search for 'rails https' yielded this on the 5th entry
> found. It seems like exactly what you need to do.
>
> http://railsruby.blogspot.com/2006/02/https-open-u......

It also looks completely out of date - that patch doesn't look like it
would apply to current versions of ruby 1.8.6
if net/https can't be required then I would assumed this is on a linux
distribution where ruby is split into multiple packages, one of which
is usually this one with ssl stuff in it ( libopenssl-ruby in ubuntu)


Fred
19eb75164135659a8fae98101b1c250e?d=identicon&s=25 Arun Kumar (arun_nss)
on 2009-03-17 16:04
Frederick Cheung wrote:
> On Mar 17, 2:37�pm, Ar Chron <rails-mailing-l...@andreas-s.net> wrote:
>> > Please help. I'll be really thankful
>>
>> > Regards
>> > Arun Kumar
>>
>> A quick Google search for 'rails https' yielded this on the 5th entry
>> found. It seems like exactly what you need to do.
>>
>> http://railsruby.blogspot.com/2006/02/https-open-u......
>
> It also looks completely out of date - that patch doesn't look like it
> would apply to current versions of ruby 1.8.6
> if net/https can't be required then I would assumed this is on a linux
> distribution where ruby is split into multiple packages, one of which
> is usually this one with ssl stuff in it ( libopenssl-ruby in ubuntu)
>
>
> Fred

Yes i think like that. As a fresher to ruby, i didn't understand a bit
of the code and as u said looks outdated. If u have any tricks in the
trade to parse html content from atleast this site.
http://www.youtube.com'

i'm receiving an error like this while extracting data from the site :
`open_http': 400 Bad Request (OpenURI::HTTPError)
This is not the error which is displayed in the case of 'https://'
sites.

Please help

Regards
Arun Kumar
This topic is locked and can not be replied to.