Forum: Ruby on Rails Wikipedia Parser

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
David (Guest)
on 2007-04-12 22:25
(Received via mailing list)
I need to parse and redisplay in html wikipedia articles (formatted
with the wikipedia style). Has anyone encountered such a library in
ruby ? Any libraries that are good at that?

Thanks
Chris T. (Guest)
on 2007-04-12 22:41
(Received via mailing list)
David wrote:
> I need to parse and redisplay in html wikipedia articles (formatted
> with the wikipedia style). Has anyone encountered such a library in
> ruby ? Any libraries that are good at that?
>
> Thanks
>
>
> >
>
>
Check out
http://shanesbrain.net/articles/2006/10/02/screen-...
Makes it dead easy to roll your own.
Chris
---------------------------------------
http://www.autopendium.co.uk
Stuff about old cars
Andy T. (Guest)
on 2007-04-12 22:47
(Received via mailing list)
Usually you shouldn't use bots on wikipedia, but should download the
free database instead and use that.
Read about their policy here:
http://en.wikipedia.org/wiki/Wikipedia:Bots

If you have your own mediawiki install and want to use a bot, you can
check out pywikipedia bot:
http://sourceforge.net/projects/pywikipediabot/  It's not in ruby,
but it works great.
Russell N. (Guest)
on 2007-04-13 00:53
(Received via mailing list)
Actually, I'm not entirely sure that you shouldn't use bots at all on
the
Wikipedia. According to the link you provided:

"*Robots* or *bots* are automatic
processes<http://en.wikipedia.org/wiki/Process_%28computing%29>that
interact with Wikipedia as though they were human editors"

That last bit sounds like they're talking about a very specific kind of
bot
and not just a scraper.

RSL
unknown (Guest)
on 2007-04-13 01:13
(Received via mailing list)
"*Robots* or *bots* are automatic
processes<http://en.wikipedia.org/wiki/Process_%28computing%29>that
interact with Wikipedia as though they were human editors." There's
nothing against screen-scraping there. That policy is about bots which
edit
content. Otherwise, Google would be breaking WP policy.
This is taking the discussion a little off topic though.
-Nathan
Shane V. (Guest)
on 2007-04-13 01:13
(Received via mailing list)
I wrote that article a while ago.  It'll be interesting to use
WWW::Mechanize, or better yet, scRUBYt, which use Hpricot in the
backend anyway.

Shane

http://shanesbrain.net
Andy T. (Guest)
on 2007-04-13 01:21
(Received via mailing list)
If you just need to cache some pages for displaying later, screen
scraping Wikipedia is a good choice compared to downloading the db.
If you're going to be parsing and redisplaying the content in real
time that is against Wikipedia's policy.

See http://en.wikipedia.org/wiki/
Wikipedia:Database_download#Why_not_just_retrieve_data_from_wikipedia.or
g_at_runtime.3F
This topic is locked and can not be replied to.