jerome
November 20, 2006, 11:52pm
1
I am trying to parse some files that contain comments like this:
images, text, etc…
Interesting text of site here.
I am wondering how to go about extracting the data within the comments
block using Hpricot. I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.
Thanks for any ideas!
jerome
November 21, 2006, 12:50am
2
On 11/20/06, Jerome — [email protected] wrote:
I am trying to parse some files that contain comments like this:
…
I am not aware of a way to refer to commented HTML
through CSS or XPath selectors.
The XPath comment() selector will select all comments:
For example (xpath after -m flag):
keith@devel ~ $ xml sel -t -m ‘//comment()’ -v ‘.’ -n simple.xml
one comment
two comment
keith@devel ~ $ cat simple.xml
HTH,
Keith
jerome
November 24, 2006, 8:54pm
3
Jerome — wrote:
Interesting text of site here.
I am wondering how to go about extracting the data within the comments
block using Hpricot.
The best and easiest way to parse this file using Hpricot with your
required
specification … is not to use Hpricot.
start_mark = “”
end_mark = “”
data = File.read(page_path)
output = data.scan(%r{#{start_mark}(.*?)#{end_mark}}m)
All done, finished, no poring over documentation, no considering
rewriting
the library to get it to do what you actually want, done.
By the way. Did I mention that inserting new data into the same page
structure is about the same level of difficulty?