Forum: Ruby on Rails Wikipedia Parser

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
5c6b48b15cb12a718e753d7b5e0a9a04?d=identicon&s=25 David (Guest)
on 2007-04-12 20:25
(Received via mailing list)
I need to parse and redisplay in html wikipedia articles (formatted
with the wikipedia style). Has anyone encountered such a library in
ruby ? Any libraries that are good at that?

Thanks
58c6efb8466b9f85155fe6aa9fc37fce?d=identicon&s=25 Chris Taggart (christ)
on 2007-04-12 20:41
(Received via mailing list)
David wrote:
> I need to parse and redisplay in html wikipedia articles (formatted
> with the wikipedia style). Has anyone encountered such a library in
> ruby ? Any libraries that are good at that?
>
> Thanks
>
>
> >
>
>
Check out
http://shanesbrain.net/articles/2006/10/02/screen-...
Makes it dead easy to roll your own.
Chris
---------------------------------------
http://www.autopendium.co.uk
Stuff about old cars
3726bda7f0d852f7e8296fb0d69aa9e0?d=identicon&s=25 Andy Triboletti (Guest)
on 2007-04-12 20:47
(Received via mailing list)
Usually you shouldn't use bots on wikipedia, but should download the
free database instead and use that.
Read about their policy here:
http://en.wikipedia.org/wiki/Wikipedia:Bots

If you have your own mediawiki install and want to use a bot, you can
check out pywikipedia bot:
http://sourceforge.net/projects/pywikipediabot/  It's not in ruby,
but it works great.
5030981121b21bed8aee074f68bd5074?d=identicon&s=25 Russell Norris (Guest)
on 2007-04-12 22:53
(Received via mailing list)
Actually, I'm not entirely sure that you shouldn't use bots at all on
the
Wikipedia. According to the link you provided:

"*Robots* or *bots* are automatic
processes<http://en.wikipedia.org/wiki/Process_%28computing%29>that
interact with Wikipedia as though they were human editors"

That last bit sounds like they're talking about a very specific kind of
bot
and not just a scraper.

RSL
D5145c421cd25af6fa577c15219add90?d=identicon&s=25 unknown (Guest)
on 2007-04-12 23:13
(Received via mailing list)
"*Robots* or *bots* are automatic
processes<http://en.wikipedia.org/wiki/Process_%28computing%29>that
interact with Wikipedia as though they were human editors." There's
nothing against screen-scraping there. That policy is about bots which
edit
content. Otherwise, Google would be breaking WP policy.
This is taking the discussion a little off topic though.
-Nathan
363548d0cfaaa4bab88747f31ad49c02?d=identicon&s=25 Shane Vitarana (Guest)
on 2007-04-12 23:13
(Received via mailing list)
I wrote that article a while ago.  It'll be interesting to use
WWW::Mechanize, or better yet, scRUBYt, which use Hpricot in the
backend anyway.

Shane

http://shanesbrain.net
3726bda7f0d852f7e8296fb0d69aa9e0?d=identicon&s=25 Andy Triboletti (Guest)
on 2007-04-12 23:21
(Received via mailing list)
If you just need to cache some pages for displaying later, screen
scraping Wikipedia is a good choice compared to downloading the db.
If you're going to be parsing and redisplaying the content in real
time that is against Wikipedia's policy.

See http://en.wikipedia.org/wiki/
Wikipedia:Database_download#Why_not_just_retrieve_data_from_wikipedia.or
g_at_runtime.3F
This topic is locked and can not be replied to.