Forum: Ruby How to get contents of word file page by page

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
358a4fdb8939d8fc7ee6a439174c0446?d=identicon&s=25 Talib Hussain (talib)
on 2008-12-12 11:10
Hi,

I have a 3 paged document, I want to read contents of each page. How cn
i do that.

TIA,
Talib Hussain
358a4fdb8939d8fc7ee6a439174c0446?d=identicon&s=25 Talib Hussain (talib)
on 2008-12-12 12:52
Talib Hussain wrote:
> Hi,
>
> I have a 3 paged document, I want to read contents of each page. How cn
> i do that.
>
> TIA,
> Talib Hussain

Anyone please
Bfa670ef2303deb7dec5a8027367b30b?d=identicon&s=25 David Mullet (mully)
on 2008-12-12 14:50
Talib Hussain wrote:
> Hi,
>
> I have a 3 paged document, I want to read contents of each page. How cn
> i do that.
>
> TIA,
> Talib Hussain

Assuming...

  -- You are working with a Microsoft Word document.
  -- You have actual page breaks between pages

...you can create an array of the text on each page by getting the
document contents' text and splitting it on the page break. So, where
doc is your Word document object, you can do this:

  pages = doc.content.text.split("\f")
  pages.each do |page|
      # do something with this page's text
  end

Hope that helps.

David

http://rubyonwindows.blogspot.com
http://rubyonwindows.blogspot.com/search/label/word
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2008-12-12 14:54
(Received via mailing list)
2008/12/12 Talib Hussain <talibhn@gmail.com>:
> Hi,
>
> I have a 3 paged document, I want to read contents of each page. How cn
> i do that.
>
If you want only text contents, try this

require 'win32ole'
word = WIN32OLE.new('word.application')
file = 'c:/work/test.doc'
doc = word.documents.open(file,'ReadOnly' => true)
page = doc.ComputeStatistics(2)       # wdStatisticPages = 2
for i in 1..page
  word.selection.goto(1,1,i)          # wdGoToPage = 1
  word.selection.goto(-1,0,0,'\page') # wdGoToBookmark = -1
  puts "PAGE #{i}"
  puts word.selection.text
end
word.activedocument.close(false)
word.quit

Regards,
Park Heesob
358a4fdb8939d8fc7ee6a439174c0446?d=identicon&s=25 Talib Hussain (talib)
on 2008-12-15 05:53
Heesob Park wrote:
> 2008/12/12 Talib Hussain <talibhn@gmail.com>:
>> Hi,
>>
> Regards,
> Park Heesob

Thanks a lot Park, you are genius.

My requirements is that I have a document (Word file) of say 3 pages
with formatted text.

I need to extract the contents of each page with formatting and save
that as a seprate .PDF document.

Is this possible? If yes how can I do that?

Also, do I need to install Office 2007 in order to save files as .PDF
documents.

Kindly let me know.
7864d809993744583d15b0f0a2f8dded?d=identicon&s=25 Marius Žilėnas (kronidas)
on 2008-12-15 07:27
You must be trying to solve a problem (word document convertation to
pdf) with a wrong tool:). You don't need ruby to convert word file to
pdf. There are tools like Word2pdf for this.

Talib Hussain wrote:
> Heesob Park wrote:
>> 2008/12/12 Talib Hussain <talibhn@gmail.com>:
>>> Hi,
>>>
>> Regards,
>> Park Heesob
>
> Thanks a lot Park, you are genius.
>
> My requirements is that I have a document (Word file) of say 3 pages
> with formatted text.
>
> I need to extract the contents of each page with formatting and save
> that as a seprate .PDF document.
>
> Is this possible? If yes how can I do that?
>
> Also, do I need to install Office 2007 in order to save files as .PDF
> documents.
>
> Kindly let me know.
358a4fdb8939d8fc7ee6a439174c0446?d=identicon&s=25 Talib Hussain (talib)
on 2008-12-15 07:50
Name Surname wrote:
> You must be trying to solve a problem (word document convertation to
> pdf) with a wrong tool:). You don't need ruby to convert word file to
> pdf. There are tools like Word2pdf for this.
>
> Talib Hussain wrote:
>> Heesob Park wrote:
>>> 2008/12/12 Talib Hussain <talibhn@gmail.com>:
>>>> Hi,
>>>>
>>> Regards,
>>> Park Heesob
>>
>> Thanks a lot Park, you are genius.
>>
>> My requirements is that I have a document (Word file) of say 3 pages
>> with formatted text.
>>
>> I need to extract the contents of each page with formatting and save
>> that as a seprate .PDF document.
>>
>> Is this possible? If yes how can I do that?
>>
>> Also, do I need to install Office 2007 in order to save files as .PDF
>> documents.
>>
>> Kindly let me know.


Agreed, but I have to create 3 seprate doc files out of one document
(each page of the document) and send these files as input to the pdf
converter
7864d809993744583d15b0f0a2f8dded?d=identicon&s=25 Marius Žilėnas (kronidas)
on 2008-12-15 08:03
If you have Word2pdf like program, then check if you can specify which
page to covert. You could call Word2pdf several times specifying
different page numbers to convert.

Word2pdf -n 1 infile.doc out1.pdf
Word2pdf -n 2 infile.doc out2.pdf
Word2pdf -n 3 infile.doc out3.pdf

:D
The only thing here is to find(have) Word2pdf program which supports
that :).


Talib Hussain wrote:
> Name Surname wrote:
>> You must be trying to solve a problem (word document convertation to
>> pdf) with a wrong tool:). You don't need ruby to convert word file to
>> pdf. There are tools like Word2pdf for this.
>>
>> Talib Hussain wrote:
>>> Heesob Park wrote:
>>>> 2008/12/12 Talib Hussain <talibhn@gmail.com>:
>>>>> Hi,
>>>>>
>>>> Regards,
>>>> Park Heesob
>>>
>>> Thanks a lot Park, you are genius.
>>>
>>> My requirements is that I have a document (Word file) of say 3 pages
>>> with formatted text.
>>>
>>> I need to extract the contents of each page with formatting and save
>>> that as a seprate .PDF document.
>>>
>>> Is this possible? If yes how can I do that?
>>>
>>> Also, do I need to install Office 2007 in order to save files as .PDF
>>> documents.
>>>
>>> Kindly let me know.
>
>
> Agreed, but I have to create 3 seprate doc files out of one document
> (each page of the document) and send these files as input to the pdf
> converter
621080472679266b8e9b81aff4800398?d=identicon&s=25 Saji N. Hameed (Guest)
on 2008-12-15 09:07
(Received via mailing list)
* Name Surname <mzilenas@gmail.com> [2008-12-15 15:55:34 +0900]:

> that :).
>

Surely, openoffice must have something - you can export word documents
as PDFs - there may be a corresponding command line utility...


saji

> >>>>> Hi,
> >>> that as a seprate .PDF document.
> > (each page of the document) and send these files as input to the pdf
> > converter
>
> --
> Posted via http://www.ruby-forum.com/.
>
>

--
Saji N. Hameed

APEC Climate Center                  +82 51 668 7470
National Pension Corporation Busan Building 12F
Yeonsan 2-dong, Yeonje-gu, BUSAN 611705      saji@apcc21.net
KOREA
C8a2c1ab8578cc2f2f91ad636a76312e?d=identicon&s=25 Anandh Kumar (anandhkumar)
on 2009-06-12 16:53

   Thanks park... that was good... now say, my word document has got
some student detail information such as name,marks register no... these
are the entries i'll be having... say me how to parse this strings and
upload it to the database...







Thanks
This topic is locked and can not be replied to.