Hpricot search condition

I have an html

Central Kolkata

  • Eden Gardens (one of the most famous cricket stadiums in the world),
  • Akashwani Bhavan, All India Radio building
  • Indoor Stadium
  • Fort William, the massive and impregnable British Citadel built in 1773. The fort is still in use and retains its well-guarded grandeur. Visitors are allowed in with special permission only.
  • Victoria Memorial [10] Along St. George’s Gate Road, on the southern fringe of the Maidan, you will find Kolkata's most famous landmark , a splendid white marble monument (CLOSED MONDAYS).
  • Calcutta Racecourse
  • Chowringee, is the Market place of Kolkata. You will find shops ranging from Computer Periferals to cloth merchants. Even tailors and a few famous Movie theaters too. This place is a favourite pass time for local people.
  • Northern Kolkata

  • Nakhoda Mosque1 (the largest mosque in Kolkata) and the
  • Shobhabajar Rajbari the ancestral house of Rja Naba Krishna, one of the rich locals to side with Clive during his war with Nabab Siraj-Ud-Daula.
  • Jorasanko Thakur Bari1 (Tagore Family residence).
  • Parashnath Jain Temple, near the Belgachia metro station.
  • Parashnath Jain Temple, at Gouribari, less visited, reachable from the Sovabazar Metro station (take an auto rickshaw).
  • if wanna search innert_text of

  • tags upto Northern Kolkata

    but it searches all

  • my code is

    doc.search(“li”).each do |y|

    it searches all

  • i know but i wanna stop upto element using
    hpricot

    Regards
    Prashant

  • Is this your html, or are you scraping someone else’s html?

    If it’s yours, organize your html differently… if you know you want to
    be processing a section at a time, wrap those sections with an
    identifiable container, then scope your searches by the container.

    blah

  • a
  • b
  • blah2

  • c
  • d
  • (doc/“div”).each do |dv|
    this_h3 = (dv/“h3”)
    if this_h3.inner_html == “blah2”
    (dv/“li”).each do |li|
    puts li.inner_html
    end
    end
    end

    emits just c, and d

    If its someone else’s html in that format, you’ll probably have to go
    elem by elem for the whole doc with state machine-ish code to track what
    you’ve seen previously since there doesn’t seem to be any real ‘path’ to
    the li’s per h3.

    Ya I am scraping someone html so that i cant change the format right.
    can u help me .

    Regards
    Prashant

    On Tue, Sep 8, 2009 at 10:40 PM, Ar Chron

    Your html is still flat, so you have to work with the patterns that you
    see.
    You have:
    span
    li
    li
    li
    span
    li
    li
    li
    etc…

    An ugly, brute force, one case solution is to:

    read the page with Hpricot
    remove the header
    convert it to a simple string representation
    stick your opening tag ‘’ at the head
    stick your closing tag and a div end ‘’ at the tail
    change all ‘’ to ‘


    doctor up the new head from ‘
    ’ to just ‘

    re-create your Hproicot doc from the modified string

    which takes about 8 lines of code.

    YMMV

    Hi othewise i have another chance so that i can change html form what u
    told

    Hi this one i have an html

    Central Kolkat

  • Eden Gardens (one of the
    most
    famous cricket stadiums in the world),

  • Akashwani Bhavan, All India Radio building
  • Indoor Stadium
  • Fort William, the massive and impregnable British Citadel built in 1773. The fort is still in use and retains its well-guarded grandeur. Visitors are allowed in with special permission only.
  • Victoria Memorial [10] Along St. George’s Gate Road, on the southern fringe of the Maidan, you will find Kolkata's most famous landmark , a splendid white marble monument (CLOSED MONDAYS).
  • Calcutta Racecourse
  • Chowringee, is the Market place of Kolkata. You will find shops ranging from Computer Periferals to cloth merchants. Even tailors and a few famous Movie theaters too. This place is a favourite pass time for local people.
  • Red Fort
  • Eden Gardens (one of the most famous cricket stadiums in the world),
  • Akashwani Bhavan, All India Radio building
  • Indoor Stadium
  • i WANT TO ADD new element ex:

    in this html

    Central Kolkat
  • Eden Gardens (one of
    the
    most famous cricket stadiums in the world),

  • Akashwani Bhavan, All India Radio building
  • Indoor Stadium
  • Fort William, the massive and impregnable British Citadel built in 1773. The fort is still in use and retains its well-guarded grandeur. Visitors are allowed in with special permission only.
  • Victoria Memorial [10] Along St. George’s Gate Road, on the southern fringe of the Maidan, you will find Kolkata's most famous landmark , a splendid white marble monument (CLOSED MONDAYS).
  • Calcutta Racecourse
  • Chowringee, is the Market place of Kolkata. You will find shops ranging from Computer Periferals to cloth merchants. Even tailors and a few famous Movie theaters too. This place is a favourite pass time for local people.
  • Red Fort
  • Eden Gardens (one of the most famous cricket stadiums in the world),
  • Akashwani Bhavan, All India Radio building
  • Indoor Stadium
  • Please note that

    have to insert before closes when another
    span
    tag starts is it possible

    can anybody help me using hprciot.

    Regards
    Prashanth

    On Wed, Sep 9, 2009 at 9:21 AM, prashanth hiremath <

    I dont know how to re-create your Hproicot doc from the modified string
    please help me.

    On Thu, Sep 10, 2009 at 10:20 AM, prashanth hiremath <

    Thank u i have done what u told using gsub operator i replaces the tags
    to
    the form as u told,but problem is that

    if
    doc = Hpricot(open(‘Delhi.txt’))
    x=doc.to_s
    doc1=x.gsub(/<(/?)li>/,’’)

          puts doc1
          doc1.search('span').each do |y|
          puts y.inner_text
         end
    

    its giving error

    undefined method `search’ for #String:0xb7d0bc74 (NoMethodError)
    because doc1 is string how can i conevrt so that i can read the file
    again
    by hpricot

    Regards
    Prashanth Hiremath

    On Wed, Sep 9, 2009 at 10:38 PM, Ar Chron

    2009/9/10 prashanth hiremath [email protected]:

    Â Â Â Â Â Â Â Â Â doc1.search(‘span’).each do |y|
    Â Â Â Â Â Â Â Â Â puts y.inner_text
    Â Â Â Â Â Â Â Â end

    its giving error

    Â undefined method `search’ for #String:0xb7d0bc74 (NoMethodError)
    because doc1 is string how can i conevrt so that i can read the file again
    by hpricot

    Please don’t top post, it annoys readers on this list and makes it
    less likely that you will get help.

    I have not used hpricot but if I were in your situation the first
    thing I would do is carefully look through the documentation for
    hpricot. Have you done that?

    Colin

    K i wont post again sorry can u help me to incremant

    blah

  • a
  • b
  • blah2

  • c
  • d
  • (doc/“div”).each do |dv|
    this_h3 = (dv/“h3”)
    if this_h3.inner_html == “blah2”
    (dv/“li”).each do |li|
    puts li.inner_html
    end
    end
    end

    this is ur code i wanted to check for all inner_text

    tags ex:
    “blah2” u given how i can test for “blah1” all in loop

    Regards
    Prashanth