Forum: Ruby Mailing List Files (#115)

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-02-23 15:00
(Received via mailing list)
The three rules of Ruby Quiz:

1.  Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.

2.  Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3.  Enjoy!

Suggestion:  A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby Talk follow the discussion.  Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The Ruby Talk mailing list archives will show files attached to incoming
messages.  However, it's not always easy to get at the data from these
files
using the archives alone.  The attachments are sometimes displayed in
not-too-readable formats:

  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

This is tough for those of us who like to play with Ruby Quiz solutions.

This week's quiz is to write a program that takes a message id number as
a
command-line argument and "downloads" any attachments from that message.
Assume
message ids are for Ruby Talk posts by default, but you may want to
provide an
option to override that so we can support lists like Ruby Core as well.

If no path is given, write the attachments to the working directory.
When there
is a path, your code should place the files there instead.
E4d8c6e0e9e5ca6ab51b4b1937586e6f?d=identicon&s=25 Christoffer Lernö (Guest)
on 2007-02-24 12:54
(Received via mailing list)
Does anyone have some good sample messages?

Aside from 190780 and  226884 I've been testing with 66854 and 63060
that cover some cases not in 190780 and 226884.

Does anyone have some nice tricky attachments to test with?


/Christoffer
E4d8c6e0e9e5ca6ab51b4b1937586e6f?d=identicon&s=25 Christoffer Lernö (Guest)
on 2007-02-25 21:27
(Received via mailing list)
Much longer than Brian's submission but here goes:

###
#!/usr/bin/env ruby -w

require 'getoptlong'
require "net/http"
require "base64"

opts = GetoptLong.new([ '--help', '-h', GetoptLong::NO_ARGUMENT],
                       [ '--path', '-p', GetoptLong::REQUIRED_ARGUMENT],
                       [ '--url', '-u', GetoptLong::REQUIRED_ARGUMENT],
                       [ '--debug', '-d', GetoptLong::NO_ARGUMENT])

def print_usage_and_exit
   puts "Usage: #{File.basename($PROGRAM_NAME)} [switches] message-id"
   puts "  -p directory     set save directory to directory"
   puts "  -u url           set url to use to url"
   puts "  -d               display all decoded data as it is read"
   exit 0
end

class String

   def strip_html
     string.dup.strip_html!
   end

   def decode_quoted_printable
     decoded_string = gsub(/=../) { |code| code[1..2].hex.chr }
     strip_last = decoded_string.rstrip
     strip_last[-1] == ?= ? strip_last.chop! : decoded_string
   end

   def strip_html!
     gsub!(/<.*?>/, '')
     gsub!(/&.*?;/) do |match|
       case match
       when "&amp;" then '&'
       when "&quot;" then '"'
       when "&gt;" then '>'
       when "&lt;" then '<'
       when /&#\d+;/ then match[/(\d)+/].to_i.chr
       when /&#x[0-9a-fA-F]+;/ then match[/[0-9a-fA-F]+/].hex.chr
       else match
       end
     end
     self
   end

end

class WaitingState
   def process(line)
     return self unless line =~ /^--/
     HeaderReadingState.new(line.strip)
   end
end

WAITING_STATE = WaitingState.new

# State reading content description
class HeaderReadingState

   def initialize(line)
     @line = line.strip
     @data = {}
     @entry = nil
   end

   def process(line)

     line.strip!

     # Ignore this attachment if we only have content lines.
     return WAITING_STATE if line[@line]

     # Switch to reading attachment-data when we encounter an empty
line.
     return AttachmentParsingState.new(@line, @data) if line.empty?

     # If we have an entry-header, handle this.
     if line =~ /.*:/
       @entry = line.slice!(/^.*:/)
       @entry.chop!
     end

     # Invalid attachment
     return WAITING_STATE if @entry.nil? && !line.empty?

     unless line.empty? then
       entry = @entry.downcase
       data = line.strip
       if data[-1] == ?;
         # More data on next line, so just chop the ; and keep the
same entry
         data.chop!
       else
         # Last data for this entry, so make sure next line has an
entry.
         @entry = nil
       end
       # Data for each entry is stored as an array.
       @data[entry] = (@data[entry] || []) + data.split(/;/).collect
{ |part| part.strip }
     end

     # Stay in this state
     self

   end

end



# State for reading attachment content.
class AttachmentParsingState

   IDENTITY_DECODING = lambda { |string| string }

   QUOTED_PRINTABLE_DECODING = lambda { |string|
string.decode_quoted_printable }

   BASE64_DECODING = lambda { |string| Base64.decode64(string) }

   ENCODINGS = { 'base64' => BASE64_DECODING,
                 'quoted-printable' => QUOTED_PRINTABLE_DECODING }

   def initialize(line, data)
     @line = line

     # Determine the encoding of the content.
     encoding = ((data["content-transfer-encoding"] || []).first ||
"none").downcase

     # Select a decoding and default to identity decoding.
     @decoding = ENCODINGS[encoding.downcase]

     # Check content-disposition if this is an attachement.
     # If so, extract the filename.
     disposition = data["content-disposition"]
     if disposition
       @filename = parse_filename(disposition)
       puts "Found attachment #{@filename} with encoding '#
{encoding}'." if @filename
       puts "No decoder found for '#{encoding}' - decoding turned
off." unless @decoding
     else
       @filename = nil
     end

     @data = ""
   end

   # Parse out a possible filename from content-disposition.
   def parse_filename(disposition)
     return nil unless disposition.member?("attachment")
     filename = disposition.find("") { |value| value =~ /filename/ }
     filename.slice!("filename=")
     filename.strip!
     filename = eval(filename) if filename =~ /\".*\"/
     filename.empty? ? nil : filename
   end

   def store_attachment
     if @filename then
       filename = File.join($file_path, @filename)
       if File.exist? filename
         puts "Extraction done: #{filename} already exists - skipping."
       else
         File.open(filename, "w+") { |file| file.print @data }
         puts "Extraction done: Attachment saved as '#{filename}'"
       end
     end
   end

   # Process a line
   def process(line)

     if line[@line]
       # store the data we got this far.
       store_attachment

       # We hit a delimiter, so go back to header reading state.
       return HeaderReadingState.new(line)
     end

     # Decode and store data.
     decoded = @decoding ? @decoding.call(line) : line
     print decoded if $debug
     @data << decoded

     # stay in this state
     self

   end

end


def save_attachment(host, path, index)
   state = WAITING_STATE
   Net::HTTP.get(host, path + index.to_s).strip_html!.each do |str|
     state = state.process(str)
   end
end

$file_path = "."
$debug = false
host = "blade.nagaokaut.ac.jp"
path = "/cgi-bin/scat.rb/ruby/ruby-talk/"

opts.each do |opt, arg|
   case opt
   when '--help'
     print_usage_and_exit
   when '--path'
     if File.directory?(arg)
       $file_path = arg
     else
       puts "Illegal path '#{arg}' - Aborting."
       exit 0
     end
   when '--debug'
     $debug = true
   when '--url'
     url = arg.gsub(/.*:\/\//, '')
     path = url[/\/.*$/]
     path += "/" unless path[-1] == ?/
     url.slice!(path)
     host = url
   end
end

print_usage_and_exit if ARGV.length != 1

message_id = ARGV.first.to_i

save_attachment(host, path, message_id)


###
Aff7605ddfa1b4c6b580089dea653a55?d=identicon&s=25 John Browning (Guest)
on 2007-02-26 20:48
(Received via mailing list)
#!/usr/bin/env ruby
#

require 'net/http'
require 'strscan'
require 'cgi'

class GetAttachments
   def initialize(id)
     @id = id
     @url = "blade.nagaokaut.ac.jp"
     @params = "/cgi-bin/scat.rb/ruby/ruby-talk/" + @id
     @attachments = Array.new
   end

   def store_attachments
     # get the attachment, then store it.
     self.fetch_attachments
     self.save_attachments
   end

   def fetch_attachments
     # get the page and extract email from pre tags
     @page = Net::HTTP.get(@url, @params)
     @page =~ /\<pre\>(.+)\<\/pre\>/im
     @email = $1
     # get rid of everything before the first part separator
     # NB boundary separators assumed to start with -- No RFC
guarantee this is always right.
     @email.sub!(/\A([^-]|-[^-])+/m, '')
     # create a scanner and grab header / body pairs
     @mime_scanner = StringScanner.new(@email)
     # this regex looks for a boundary line beginning -- then a line
beginning Content then other header stuff then a blank line then body
stuff
     # then either another of the same or a boundary then an empty
line. Lookahead ?= prevents using part of next token.
     while @mime_scanner.scan(/(^--.+?\nContent.*?^\s*$)(.*?)(?=^--.+?
\n(Content|^\s*$))/im) do
       attachment = Hash.new
       # translate html escapes and get rid of html mark-up that
seems to creep into body, plus starting and trailing spaces
       attachment[:header] = CGI.unescapeHTML( @mime_scanner[1] )
       attachment[:body] = CGI.unescapeHTML( @mime_scanner[2].gsub(/\A
\s+/,'').chomp.gsub(/\<[^\>]*\>/, '') )
       @attachments = @attachments << attachment
     end
   end

   def save_attachments
     @attachments.each do |a|
       # skip parts that aren't attachments
       next if !(a[:header] =~ /Content-Disposition:\s*attachment/i)
       # grab file name and encoding.
       # quit with error if no filename.
       if ( a[:header] =~ /filename\s*\=\s*\"?([a-z\-\_\ 0-9\.\%\$\@
\!]+)\"?\s*(\n|\;)/i || a[:header] =~ /name\s*\=\s*\"?([a-z\-\_\ 0-9\.
\%\$]+)\"?\s*(\n|\;)/i )
         # do above as || to favor filename over name, which may be
unnecessary
         # NB hasty assumptions about file name characters
         filename = $1
       else
         puts "Could not parse filename for attachment from #{a
[:header]}"
         exit 1
       end
       if ( a[:header] =~ /Content-Transfer-Encoding:\s*\"?([a-z\-
\_0-9]+)\"?\s*?(\n|\;)/i )
         encoding = $1
       end
       # if the filename specifies a directory and it exists, use it.
Otherwise just put in pwd.
       # NB clobbers any files with same name as attachment.
       if ( File.exist?(File.dirname(filename)) )
         file = File.new(filename, "w+")
       else
         file = File.new(filename = File.basename(filename), "w+")
       end
       # decode if necessary
       case encoding
       when /base64/i
         file << a[:body].unpack("m").first
       when /quoted-printable/i
         file << a[:body].unpack("M").first
       else
         file << a[:body]
       end
       # notify what's been done, clean up and go home
       file.close
       puts "Stored attachment from message #{@id} at #{@url} in #
{File.expand_path(filename)}"
       exit 0
     end
   end
end

ARGV.each do |arg|
   @ga = GetAttachments.new(arg)
   @ga.store_attachments
end



........................................................................
............................
John Browning
51a34236538906ab994cf9f2e533d14d?d=identicon&s=25 Lou Scoras (ljscoras)
on 2007-02-28 14:26
(Received via mailing list)
#!/usr/bin/env ruby
#
# q115.rb - solution to rubyquiz #115 (Mailing List Files)
# Lou Scoras <louis.j.scoras@gmail.com>
# February 28, 2007
#
# = Dependancies
#
# It felt like I was cheating a lot in this quiz since I made use of
several
# great libraries to do everything for me =)  If you want to play with
the
# script, you'll need to get a hold of:
#
# ActionMailer::  This was used for access to TMail.  You might be able
to use
#                 TMail by itself, but I haven't tested it and rails
might
#                 have made some modifications.
#
# Elif::          This handy little library reads files backwards.  This
was
#                 actually a solution from a previous quiz ({64 - Port a
#                 Library}[http://www.rubyquiz.com/quiz64.html]). Plus
it's
#                 from James so you know it's good stuff ;)
#
# Hpricot::       Used this little gem (no not the kind of package) to
do the
#                 scraping to get all the solutions for a quiz.
Awesome, just
#                 awesome!
#
# = The Script
#
# The messages in the archive are pretty close to being readable by
TMail.
# Each page is just missing the correct mime header to let the mail
parser
# know it's actually got attachments.
#
# After pulling out all the html artifacts, we still need to find the
mime
# boundary.  An easy way to do this is just look for the
content-disposition
# headers for the attachments and then look above them to find the
boundary.
#
# 1. Look for 'Content-Disposition: attachment'
# 2. Look for the first line above that which is not a mail header --
that's
#    what elif is helping with.
# 3. That line is the mime boundary.  Add the header into the TMail
object and
#    then you can read the attachments as normal
#
# = Running
#
# The script implements the command line interface mentioned in the quiz
# description.  You just give it the name of a ruby-talk message id and
it
# will fetch the attachments into the current directory.  If you follow
the
# number by a path you can change the output directory.
#
#     $ q115 190780 outdir
#
# As an additional feature, you can also provide the number of the quiz
# prefixed with a 'q' character.  In this case, all of the solutions
will be
# downloaded and put in a subdirectory by solver.  If the solution
didn't have
# any attachments it puts the message body into a file called
solution.txt.

require 'action_mailer'
require 'cgi'
require 'delegate'
require 'elif'
require 'fileutils'
require 'hpricot'
require 'open-uri'
require 'tempfile'

module Quiz115
 class QuizMail < DelegateClass(TMail::Mail)
   class << self
     attr_reader :archive_base_url

     def archive_base_url
       @archive_base_url ||
"http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
     end

     def solutions(quiz_number)
       doc   =
Hpricot(open("http://www.rubyquiz.com/quiz#{quiz_number}.html"))
       (doc/'#links'/'li/a').collect do |link|
         [CGI.unescapeHTML(link.inner_text), link['href']]
       end
     end
   end

   def initialize(mail)
     temp_path = to_temp_file(mail)
     boundary  = MIME::BoundaryFinder.new(temp_path).find_boundary

     @tmail = TMail::Mail.load(temp_path)
     @tmail.set_content_type 'multipart', 'mixed',
       'boundary' => boundary if boundary

     super(@tmail)
   end

   private

   def to_temp_file(mail)
     temp = Tempfile.new('qmail')

     temp.write(if (Integer(mail) rescue nil)
       url = self.class.archive_base_url + mail
       open(url) { |f| x = cleanse_html f.read }
     else
       web = URI.parse(mail).scheme == 'http'
       open(mail) { |m| web ? cleanse_html(m.read) : m.read }
     end)

     temp.close
     temp.path
   end

   def cleanse_html(str)
     CGI.unescapeHTML(str.gsub(/\A.*?<div
id="header">/mi,'').gsub(/<[^>]*>/m, ''))
   end
 end

 module MIME
   class BoundaryFinder

     ##
     # Create a parser to find the mime boundary
     #
     def initialize(file)
       @elif = ::Elif.new(file)
       @in_attachment_headers = false
     end

     ##
     # Find the mime boundary marker.  Only returns the marker if itcan
find an
     # attachment, otherwise for quiz purposes there's no reason to find
it: id
     # est we don't care about multipart/alternative messages, et
cetera.
     #
     def find_boundary
       while line = @elif.gets
         if @in_attachment_headers
           if boundary = look_for_mime_boundary(line)
             return boundary
           end
         else
           look_for_attachment(line)
         end
       end
       nil
     end

     private

     def look_for_attachment line
       if line =~ /^content-disposition\s*:\s*attachment/i
         puts "Found an attachment" if $DEBUG
         @in_attachment_headers = true
       end
     end

     def look_for_mime_boundary line
       unless line =~ /^\S+\s*:\s*/ || # Not a mail header
              line =~ /^\s+/           # Continuation line?
         puts "I think I found it...#{line}" if $DEBUG
         line.strip.gsub(/^--/, '')
       else
         nil
       end
     end
   end
 end
end

include Quiz115
include FileUtils

def process_mail(mailh, outdir)
 begin
   t = QuizMail.new(mailh)
   if t.has_attachments?
     t.attachments.each do |attachment|
       outpath = File.join(outdir, attachment.original_filename)
       puts "\tWriting: #{outpath}"
       File.open(outpath, 'w') do |out|
         out.puts attachment.read
       end
     end
   else
     outfile = File.join(outdir, 'solution.txt')
     File.open(outfile, 'w') {|f| f.write t.body}
   end
 rescue => e
   puts "Couldn't parse mail correctly. Sorry! (E: #{e})"
 end
end

def to_dirname(solver)
 solver.downcase.delete('!#$&*?(){}').gsub(/\s+/, '_')
end

query  = ARGV[0]
outdir = ARGV[1] || '.'

unless query
 $stderr.puts "You must specify either a ruby-talk message id, or a
quiz number (prefixed by 'q')"
 exit 1
end

if query =~ /\Aq/i
 quiz_number = query.sub(/\Aq/i, '')
 puts "Fetching all solutions for quiz \##{quiz_number}"

 QuizMail.solutions(quiz_number).each do |solver, url|
   puts "Fetching solution from #{solver}."

   dirname    = to_dirname(solver)
   solver_dir = File.join(outdir, dirname)

   mkdir_p solver_dir
   process_mail(url, solver_dir)
 end
else
 process_mail(query, outdir)
end

exit 0
This topic is locked and can not be replied to.