Hpricot - best way to parse based on comments

I am trying to parse some files that contain comments like this:

images, text, etc…

Interesting text of site here.

I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

Thanks for any ideas!

  • Jerome

On 11/20/06, Jerome — [email protected] wrote:

I am trying to parse some files that contain comments like this:

I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.

The XPath comment() selector will select all comments:

For example (xpath after -m flag):
[email protected] ~ $ xml sel -t -m ‘//comment()’ -v ‘.’ -n simple.xml
one comment
two comment

[email protected] ~ $ cat simple.xml

HTH,
Keith

Jerome — wrote:

Interesting text of site here.

I am wondering how to go about extracting the data within the comments
block using Hpricot.

The best and easiest way to parse this file using Hpricot with your
required
specification … is not to use Hpricot.

start_mark = “”
end_mark = “”

data = File.read(page_path)

output = data.scan(%r{#{start_mark}(.*?)#{end_mark}}m)

All done, finished, no poring over documentation, no considering
rewriting
the library to get it to do what you actually want, done.

By the way. Did I mention that inserting new data into the same page
structure is about the same level of difficulty?

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs