Hi – I’m not a programmer by trade or inclination, so this might be a
stupid question, but is there any way to add a ruby file to an
applescript studio project?
More detail: I’m writing a program to grab an article from Wikipedia,
format it into a cool-looking poster in Illustrator, and save it.
Right now the program uses BBEdit for regex-ing the text from raw
wikisyntax into something nice, but I’ve coded a script in ruby that
will do the same thing. Now I just need to figure out how to combine
the ruby file with the applescript project and pass it the name of the
article.
You can download the existing program at
http://benyates.info/WikipediaPrint.dmg
The ruby file follows (I’m sure its terrible and hackish in many ways,
due to my not being a programmer). You’ll have to have html entity
support and RubyPants installed (
http://po-ru.com/projects/html-entities/ and
http://chneukirchen.org/repos/rubypants/rubypants.rb ). I’m not sure
whether the script will work on all systems because it includes unicode
characters (Apple Roman?).
require “net/http”
require “uri”
require “htmlentities”
require “rubypants”
#####################################################
This is where you specify what article you want!
The full program takes care of the URL formatting, but for now,
make sure the capitalization is the same as that of the Wikipedia
article, and replace all spaces with underscores
theArticleName = “Acoustic_Kitty”
#####################################################
############################################################
this is just for testing when I’m not connected to the net
def offlineWikipediaArticle
homeFolder = File::expand_path(“~”)
File::open(“#{homeFolder}/Desktop/train.xml”).readlines.to_s
end
############################################################
def onlineWikipediaArticle(theArticleName)
wikiLanguage = ‘en’
wikiProject = ‘wikipedia’
wikiExportPath =
URI.parse(“http://#{wikiLanguage}.#{wikiProject}.org/wiki/Special:Export/#{theArticleName}”)
theArticleText = Net::HTTP.start(wikiExportPath.host,
wikiExportPath.port) do |http|
http.get(wikiExportPath.path, {‘User-Agent’ => ‘WikipediaPrint’})
end
return theArticleText.body
end
def cleanUp(theArticleText)
remove xml, templates, wikitables, and leading carriage returns
theArticleText = theArticleText
.gsub(/<mediawiki xmlns.+/m, ‘’)
.gsub(/{{[^{]+?}}/, ‘’)
.gsub(/{| class="wikitable".+|}/m, ‘’)
.gsub(/\A\n+/, ‘’)
replace pipelinks, remove images, replace regular wikilinks
remove noincludes, remove categories
remove leading semicolons
theArticleText = theArticleText
.gsub(/[[[^][|]+|([^][|]+)]]/, ‘\1’)
.gsub(/[[image:[^|[]]+|[^|[]]+|.+]]/i,
‘’)
.gsub(/[[image:.+]]/i, ‘’)
.gsub(/[[([^|]+?)]]/, ‘\1’)
.gsub(/.+?</noinclude>/i, ‘’)
.gsub(/[[category:.+]]/i, ‘’)
.gsub(‘\n\n;’, ‘\n\n’)
format urls
theArticleText = theArticleText.gsub(/[(http://\S+\*\w)(.+)]/i,
‘\2 (\1)’)
remove references, remove comments
remove sections: External links, see also, notes and references,
etc.
theArticleText = theArticleText
.gsub(/<ref.+</ref>/mi, ‘’)
.gsub(//m, ‘’)
.gsub(/==\s*?External links\s*?==.+/mi, ‘’)
.gsub(/==\s*?See also\s*?==.+/mi, ‘’)
.gsub(/==\s*?Notes and References\s*?==.+/mi, ‘’)
.gsub(/==\s*?References\s*?==.+/mi, ‘’)
Format lists – 2nd-order first, then top-level
theArticleText = theArticleText
.gsub(/.\n\s(**)\s*/, '. · ')
.gsub(/.?\n\s(*|#)\s*/, ‘. • ‘)
.gsub(’==. •’, ‘== •’)
Replace line breaks
theArticleText = theArticleText
.gsub(/\n\n:/, ‘\n\n’)
.gsub(/\n/, ‘¶’)
.gsub(/(¶\s*)+/, ’ ¶ ‘)
.gsub(/==\s¶/, ‘==’)
.gsub(/¶\s==/, ’ ==’)
.gsub(/\A\s¶\s/, ‘’)
Convert wikisyntax delimitors (‘’ and ==) to less-common alternates
theArticleText = theArticleText
.gsub(“‘’'”, “â—Šâ—Šâ—Š”).gsub(“‘’”, “â—Šâ—Š”)
.gsub(“===”, “√√√”).gsub(“==”, “√√”)
Convert quotes into smart-quote html entities
decode all html entities into unicode (twice, becase ampersands are
encoded)
fix rubypants’ mistakes (Hackish. Sorry.)
theArticleText = RubyPants::new(theArticleText)
.to_html.decode_entities.decode_entities
.gsub(’ ‒, ’ “’).gsub(‘“’’, ‘“‘’)
Minor cleanup
theArticleText = theArticleText
.gsub(’ • ', ’ • ‘)
.gsub(’ · ‘, ’ · ‘)
.gsub(’
’, ‘’)
.gsub(/¶\s*\Z/, ‘’)
return theArticleText
end
theArticleText = cleanUp(onlineWikipediaArticle(theArticleName))
puts theArticleText