Does anyone have a ruby based script lying around that would transform
updates to a webpage to RSS or some other feed format?
We don’t use a CMS for our website but there are items that are updated
often and RSS feeds might be appreciated. Someone must have done this
before I guess.
So is there a script that might do that? The categories are separated by
h2
tags and the items are in li tags.
Hi,
Since this is such a specailized task, it really depends on the website
you
are transforming. I would suggest you take a look at Hpricot( http://code.whytheluckystiff.net/hpricot) and at the RSS class in the
standard library. It shouldn’t be to hard to roll one up, and we can
always
help. I am usually in #ruby-lang on freenode after 7 every night.
I can cope with setting a date in the RSS, the problem is parsing this
structure. There is no surrounding element for the ul and I need both the
structure and the substructure information because the combination of those
too defines the effective identity of the ul and its items.
There seems to be no method to “give everything between to specific tags and
then go on to the next one”…
I’m not sure I understand exactly, but here’s my impression of what
you’re
trying to do.
doc = Hpricot(html_string)
(doc/:h3).each do |ele|
rss_title = ele # okay, so you have the 3rd-level header
rss_contents = Hpricot::Elements[]
while ele = h3.next_sibling
rss_contents << ele
break if ele.respond_to?(:name) and ele.name == "ul"
end
end
So, basically, you can use next_sibling (or previous_sibling) to
walk back
and forth between HTML brothers and sisters. I store it in an
Hpricot::Elements
array, since you can then just call rss_contents.to_html or do other
searches
on it.
This is available since changset [49], so you’ll need to either install
from SVN
or monkeypatch.
break if ele.respond_to?(:name) and ele.name == “ul”
end
end
So, basically, you can use next_sibling (or previous_sibling) to walk
back and forth between HTML brothers and sisters. I store it in an
Hpricot::Elements array, since you can then just call rss_contents.to_html or do other searches on it.
This is available since changset [49], so you’ll need to either install
from SVN or monkeypatch.
The next_sibling and previous_sibling methods are just what I needed.
Now for an svn checkout…
Wow hpricot seems pretty nice, I noticed the hype but now I
understand…
One question though: do you see a way of parsing a structure like this
with
hpricot:
I can cope with setting a date in the RSS, the problem is parsing this
structure. There is no surrounding element for the ul and I need both
the
structure and the substructure information because the combination of
those
too defines the effective identity of the ul and its items.
There seems to be no method to “give everything between to specific tags
and
then go on to the next one”…
Thanks for the pointers
Bart
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.