How to extract some info from a tag in html

Hi all,

I use hpricot to extract some info from some webpages.

But I don’t find examples on extract attributes from a tag. Any help
will be appreciated.

Here is one example:

  • How can I extract the attribute of data_id and return its value
    “247096”?

  • doc.at(“body”)[‘onload’]

    The above code will find the body tag and give you back the onload
    attribute. This is the most common reason to use the element directly:
    when reading and writing HTML attributes.

    More importantly, hpricot is deprecated:

    Hpricot is over.
    After years of lack of a proper maintainer for one of why’s jewels, it
    has been decided to finally close the book on hpricot. Most users have
    migrated to alternatives and there is simply no time or energy to
    continue with the current codebase.

    Try nokogiri: Installing Nokogiri - Nokogiri

    Thanks.
    I will try it later.

    Hi Dansei,

    I try to grab info from a website using my password and username.

    What is the syntax for providing nokogiri with my password and my
    username?

    Here is the script I use to get access to the website:

    webpage=https://xexample.com

    open(webpage,
    :http_basic_authentication=>[‘my_user_name’,‘my_password’])
    doc=Nokogiri::HTML(open(webpage))

    I get this info return:
    C:/Ruby21/lib/ruby/2.1.0/net/http.rb:923:in `connect’: SSL_connect
    returned=1 errno=0 state=SSLv3 read server certificate B: certificate
    verify failed (OpenSSL::SSL::SSLError)

    How to fix it?

    Thanks.