On Jul 7, 2012, at 11:11 AM, David M. wrote:
written no more changes can be sent to the data base (because all the
file handling is done in after_save)
Where or how do I sanely get the contents of those TXT files into the
database?
I built this feature in my first commercial Rails app. I used Paperclip
for my file storage, which offers its own callback called
‘after_post_process’ that worked out perfectly for me.
First, I created a Paperclip processor to extract the text version of
the uploaded file (mine were all PDF).
/lib/paperclip_processors/text.rb
module Paperclip
Handles extracting plain text from PDF file attachments
class Text < Processor
attr_accessor :whiny
# Creates a Text extract from PDF
def make
src = @file
dst = Tempfile.new([@basename, 'txt'].compact.join("."))
command = <<-end_command
"#{ File.expand_path(src.path) }"
"#{ File.expand_path(dst.path) }"
end_command
begin
success = Paperclip.run("/usr/bin/pdftotext -nopgbrk",
command.gsub(/\s+/, " "))
Rails.logger.info “Processing #{src.path} to #{dst.path} in the
text processor.”
rescue PaperclipCommandLineError
raise PaperclipError, “There was an error processing the text
for #{@basename}” if @whiny
end
dst
end
end
end
Then in my document.rb (model for the file attachment), I added the
following bits:
has_attached_file :pdf,:styles => { :text => { :fake => ‘variable’ }
}, :processors => [:text]
after_post_process :extract_text
private
def extract_text
file = File.open("#{pdf.queued_for_write[:text].path}",“r”)
plain_text = “”
while (line = file.gets)
plain_text << Iconv.conv(‘ASCII//IGNORE’, ‘UTF8’, line)
end
self.plain_text = plain_text
end
And that was that.
Walter