How to get contents of word file page by page


#1

Hi,

I have a 3 paged document, I want to read contents of each page. How cn
i do that.

TIA,
Talib H.


#2

Talib H. wrote:

Hi,

I have a 3 paged document, I want to read contents of each page. How cn
i do that.

TIA,
Talib H.

Anyone please


#3

Talib H. wrote:

Hi,

I have a 3 paged document, I want to read contents of each page. How cn
i do that.

TIA,
Talib H.

Assuming…

– You are working with a Microsoft Word document.
– You have actual page breaks between pages

…you can create an array of the text on each page by getting the
document contents’ text and splitting it on the page break. So, where
doc is your Word document object, you can do this:

pages = doc.content.text.split("\f")
pages.each do |page|
# do something with this page’s text
end

Hope that helps.

David

http://rubyonwindows.blogspot.com
http://rubyonwindows.blogspot.com/search/label/word


#4

Heesob P. wrote:

2008/12/12 Talib H. removed_email_address@domain.invalid:

Hi,

Regards,
Park H.

Thanks a lot Park, you are genius.

My requirements is that I have a document (Word file) of say 3 pages
with formatted text.

I need to extract the contents of each page with formatting and save
that as a seprate .PDF document.

Is this possible? If yes how can I do that?

Also, do I need to install Office 2007 in order to save files as .PDF
documents.

Kindly let me know.


#5

2008/12/12 Talib H. removed_email_address@domain.invalid:

Hi,

I have a 3 paged document, I want to read contents of each page. How cn
i do that.

If you want only text contents, try this

require ‘win32ole’
word = WIN32OLE.new(‘word.application’)
file = ‘c:/work/test.doc’
doc = word.documents.open(file,‘ReadOnly’ => true)
page = doc.ComputeStatistics(2) # wdStatisticPages = 2
for i in 1…page
word.selection.goto(1,1,i) # wdGoToPage = 1
word.selection.goto(-1,0,0,’\page’) # wdGoToBookmark = -1
puts “PAGE #{i}”
puts word.selection.text
end
word.activedocument.close(false)
word.quit

Regards,
Park H.


#6

You must be trying to solve a problem (word document convertation to
pdf) with a wrong tool:). You don’t need ruby to convert word file to
pdf. There are tools like Word2pdf for this.

Talib H. wrote:

Heesob P. wrote:

2008/12/12 Talib H. removed_email_address@domain.invalid:

Hi,

Regards,
Park H.

Thanks a lot Park, you are genius.

My requirements is that I have a document (Word file) of say 3 pages
with formatted text.

I need to extract the contents of each page with formatting and save
that as a seprate .PDF document.

Is this possible? If yes how can I do that?

Also, do I need to install Office 2007 in order to save files as .PDF
documents.

Kindly let me know.


#7

Name S. wrote:

You must be trying to solve a problem (word document convertation to
pdf) with a wrong tool:). You don’t need ruby to convert word file to
pdf. There are tools like Word2pdf for this.

Talib H. wrote:

Heesob P. wrote:

2008/12/12 Talib H. removed_email_address@domain.invalid:

Hi,

Regards,
Park H.

Thanks a lot Park, you are genius.

My requirements is that I have a document (Word file) of say 3 pages
with formatted text.

I need to extract the contents of each page with formatting and save
that as a seprate .PDF document.

Is this possible? If yes how can I do that?

Also, do I need to install Office 2007 in order to save files as .PDF
documents.

Kindly let me know.

Agreed, but I have to create 3 seprate doc files out of one document
(each page of the document) and send these files as input to the pdf
converter


#8

that :).

Surely, openoffice must have something - you can export word documents
as PDFs - there may be a corresponding command line utility…

saji

Hi,
that as a seprate .PDF document.
(each page of the document) and send these files as input to the pdf
converter


Posted via http://www.ruby-forum.com/.


Saji N. Hameed

APEC Climate Center +82 51 668 7470
National Pension Corporation Busan Building 12F
Yeonsan 2-dong, Yeonje-gu, BUSAN 611705 removed_email_address@domain.invalid
KOREA


#9

If you have Word2pdf like program, then check if you can specify which
page to covert. You could call Word2pdf several times specifying
different page numbers to convert.

Word2pdf -n 1 infile.doc out1.pdf
Word2pdf -n 2 infile.doc out2.pdf
Word2pdf -n 3 infile.doc out3.pdf

:smiley:
The only thing here is to find(have) Word2pdf program which supports
that :).

Talib H. wrote:

Name S. wrote:

You must be trying to solve a problem (word document convertation to
pdf) with a wrong tool:). You don’t need ruby to convert word file to
pdf. There are tools like Word2pdf for this.

Talib H. wrote:

Heesob P. wrote:

2008/12/12 Talib H. removed_email_address@domain.invalid:

Hi,

Regards,
Park H.

Thanks a lot Park, you are genius.

My requirements is that I have a document (Word file) of say 3 pages
with formatted text.

I need to extract the contents of each page with formatting and save
that as a seprate .PDF document.

Is this possible? If yes how can I do that?

Also, do I need to install Office 2007 in order to save files as .PDF
documents.

Kindly let me know.

Agreed, but I have to create 3 seprate doc files out of one document
(each page of the document) and send these files as input to the pdf
converter


#10

Thanks park… that was good… now say, my word document has got
some student detail information such as name,marks register no… these
are the entries i’ll be having… say me how to parse this strings and
upload it to the database…

Thanks