Proper Factoring for a Parser Class

Hello Railers!

I’m developing an application where I need to extract text from a number
of different file formats. To this end I would like to create a Parser
class that will be reusable and handle all of the heavy lifting.
Ideally I would like to do something like this (sorry for any code
mistakes):

my_rtf_parser = Parser.new(‘path/to/rtf_file.rtf’)
my_text = my_rtf_parser.to_text

my_pdf_parser = Parser.new(‘path/to/pdf_file.pdf’)
my_text = my_pdf_parser.to_text

The Parser class should be able to determine the correct type for the
file being passed to it and then load up the “to_text” method
appropriate to that file type.

What is the best way to separate out the classes that will perform these
action and where should I locate the files in the Rails directory
structure?

Thanks in advance for the assistance.

Mike

Mike E. [email protected] wrote:

The Parser class should be able to determine the correct type for the
file being passed to it and then load up the “to_text” method
appropriate to that file type.

What is the best way to separate out the classes that will perform these
action and where should I locate the files in the Rails directory
structure?

How to separate them is really a matter of taste… I’d probably make
them
Parser::pdf , Parser::rdf , etc… I dont know where your files are
coming
from, but from a security/bulletproofing perspective I would not trust
the
extension, I’d use mime magic to determine the type instead.

As for where to locate them in the rails structure… this operation
is
pretty low level and doesnt appear to involve activerecord, etc at all…
Depending on how big this class ends up, I’d suggest making a gem out of
it
instead. If you are sure you’re just going to use it in one project and
never distribute it elsewhere, then I’d shove it into lib/.

Cheers,
Tyler

Tyler MacDonald wrote:

extension, I’d use mime magic to determine the type instead.

Tyler, I’m unfamiliar with mime magic. What steps would I take to
integrate it into my app.

As for where to locate them in the rails structure… this operation
is
pretty low level and doesnt appear to involve activerecord, etc at all…
Depending on how big this class ends up, I’d suggest making a gem out of
it
instead. If you are sure you’re just going to use it in one project and
never distribute it elsewhere, then I’d shove it into lib/.

Good advice and much appreciated. Thanks.

Mike

Mike E. [email protected] wrote:

extension, I’d use mime magic to determine the type instead.

Tyler, I’m unfamiliar with mime magic. What steps would I take to
integrate it into my app.

http://shared-mime.rubyforge.org/ – check_magics is what you’re
looking
for.

Cheers,
Tyler