Hi,
Is there any rubygem available for converting the pdf files to xml
files?
Hi,
Is there any rubygem available for converting the pdf files to xml
files?
Arup:
I did install the PDF to HTML gem and have to say its pretty impressive!
Its all based on the pdf2htmlEX project:
(its basically just a nice ruby wrapper, so you have to have pdf2htmlEX
installed). But this gem actually opens up a whole new world of
possibilities.
In combination with something like nokogiri, you should be able to parse
almost all the data you want. However, this means youll need to brush up
on your css and/or xpath to parse again with nokogiri.
On Mac OS X, it was pretty easy to install the pdf2htmEX toolset. For
Windows, somebody has already done the compiling for you here:
Good luck!
FYI, there is a googlegroup for the pdf2htmlEX toolset and youre going
to be better off asking questions there rather than this list for any
additional help with those toolsets if you choose to use them since this
list is strictly for ruby related things.
Wayne
Wayne B. wrote in post #1139024:
Arup:
I did install the PDF to HTML gem and have to say its pretty impressive!
Its all based on the pdf2htmlEX project:
Wayne
Thanks for your reply. I was also looking for
But the issue is, if PDF have any blank column values, it is not
generating any corresponding tag for those entries. Thus couldn’t track
which data is actually under which column.
I am surely give the gem a try, you linked above.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs