How do I get an integer from an array?

bodikp · June 24, 2009, 6:48pm

Hi,
I need to process individual pages of PDFs. To do so, I need to get the
page count of the PDF, then, do some image magic with each page of that
PDF. So, first thing I do is use a utility that gives me that page
count. I get the page count, but,it’s an array. And, it doesn’t let me
treat that “array” as a number, so, I can’t do what I want. Here’s a
snippet of my script and what I get with it. Thanks.

Dir.chdir(“N:/infoconpdf”)
file = “ehs-X7917735.pdf”
pages = pdfinfo #{file}
pages = pages.scan(/^Pages:[ ]{2,99}([0-9]+)/)
puts pages
1.upto(pages) do |n|
puts n
end

I get this:
78
================ ArgumentError =====================
C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:12:in >' 1.upto(pages) do |n| C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:12:inupto’
1.upto(pages) do |n|
C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:12:in `’
1.upto(pages) do |n|
Exception: comparison of Fixnum with Array failed
Program exited with code 0

bodikp · June 24, 2009, 6:57pm

1.upto(pages.length) do |n|
puts n
end

?

bodikp · June 24, 2009, 7:03pm

On Wed, Jun 24, 2009 at 12:57 PM, Roger P.
[email protected]wrote:

1.upto(pages.length) do |n|
puts n
end

?

Posted via http://www.ruby-forum.com/.

I think his `pages’ var is a single-element array, not an array of
length
78, so this may give the desired result:

1.upto(pages[0].to_i) do |n|
puts n
end

Alex

bodikp · June 24, 2009, 7:08pm

Hi –

On Thu, 25 Jun 2009, Peter B. wrote:

pages = pdfinfo #{file}
pages = pages.scan(/^Pages:[ ]{2,99}([0-9]+)/)
puts pages
1.upto(pages) do |n|
puts n
end

I get this:
78
================ ArgumentError =====================

When you do a scan where the regex has parentheses, you get an array
for each scan through the string, with the captures as elements of
that array. So you end up with an array of arrays:

string = “Hello. I am a string.”
=> “Hello. I am a string.”

string.scan(/( \w )/)
=> [[" I “], [” a "]]

So you got back [[“78”]], I believe. You have to dig the number out.

Another way to do it is:

pages[/Pages:\D+(\d+),1/]

(plus .to_i to convert it to an integer).

David

bodikp · June 24, 2009, 7:58pm

Roger P. wrote:

1.upto(pages.length) do |n|
puts n
end

?

Yes, I want to count up to the value of pages and do fancy image stuff
with the PDF pages in that sequence. That’s all.

bodikp · June 24, 2009, 8:10pm

Peter B. wrote:

Yes, if it do pages.to_s, I get [[“78”]].

Looks like you want pages[0][0].to_i or what not then.
=r

bodikp · June 24, 2009, 8:26pm

On Wed, Jun 24, 2009 at 2:18 PM, Peter B. [email protected] wrote:

1
2
3
4
5
6
7…
78

Posted via http://www.ruby-forum.com/.

The double notation works just like chaining any other methods together,
IE

foo = bar.method.method

The index method of the array class is a method just like any other, so
you
could just as well write it like:

foo..

which calls the []' method on the result of the previous call of the []’
method, which is an array.

foo[0] is just a shortcut ruby gives you for calling foo.

Alex

bodikp · June 24, 2009, 8:18pm

Roger P. wrote:

Peter B. wrote:

Yes, if it do pages.to_s, I get [[“78”]].

Looks like you want pages[0][0].to_i or what not then.
=r

Yup. That did it. Thank you very much, Roger. Now, I have to admit, I’ve
never, ever seen that double array counter notation. [0][0]. That’s
totally weird to me. But, it works!

Dir.chdir(“N:/infoconpdf”)
file = “ehs-X7917735.pdf”
pages = pdfinfo #{file}
pages = pages.scan(/^Pages:[ ]{2,99}([0-9]+)/)
pages = pages[0][0].to_i
1.upto(pages) do |n|
puts n
end

I got:
1
2
3
4
5
6
7…
78

bodikp · June 25, 2009, 1:19pm

Alex wrote:

The index method of the array class is a method just like any other, so
you
could just as well write it like:

foo..

which calls the []' method on the result of the previous call of the[]’
method, which is an array.

foo[0] is just a shortcut ruby gives you for calling foo.

Or in this particular case, you can do

foo.first.first

bodikp · June 24, 2009, 7:59pm

Yes, if it do pages.to_s, I get [[“78”]].

bodikp · June 25, 2009, 10:03pm

Hi –

On Fri, 26 Jun 2009, Peter B. wrote:

method, which is an array.
it does a bunch, then dies.
C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:8:in `block in

' pages = pages[0][0].to_i C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `each' Dir.glob("*.pdf").each do |pdffile| C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `' Dir.glob("*.pdf").each do |pdffile|
=============================================
Exception: undefined method `[]’ for nil:NilClass

That means that somewhere along the line, the scan operation isn’t
finding what you expect it to. Is it possible that you have a document
with more than 99 occurrences of [] in a row?

I’d still recommend trying the technique I suggested in my earlier
answer. Getting a nested array of one element and unnesting it seems
like the long way around.

David

bodikp · June 26, 2009, 2:48pm

David A. Black wrote:

Hi –

On Fri, 26 Jun 2009, Peter B. wrote:

method, which is an array.
it does a bunch, then dies.
C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:8:in `block in

' pages = pages[0][0].to_i C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `each' Dir.glob("*.pdf").each do |pdffile| C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `' Dir.glob("*.pdf").each do |pdffile|
=============================================
Exception: undefined method `[]’ for nil:NilClass

That means that somewhere along the line, the scan operation isn’t
finding what you expect it to. Is it possible that you have a document
with more than 99 occurrences of [] in a row?

I’d still recommend trying the technique I suggested in my earlier
answer. Getting a nested array of one element and unnesting it seems
like the long way around.

David

But, it does hundreds of files just fine. Then, it dies. So, you’re
saying that in one file in particular it can find what’s in the scan?
I’m sorry, but, I don’t understand the technique you described earlier,
David. You say to do this:
pages[/Pages:\D+(\d+),1/]
pages = pages.to_i
I get “0” as output with this.

The output of pdfinfo is simple. Here’s an example:
Author: pb4072
Creator: MicrosoftÂ« Office Word 2007
Producer: MicrosoftÂ« Office Word 2007
CreationDate: 09/27/07 13:36:28
ModDate: 02/19/09 14:13:47
Tagged: no
Pages: 1
Encrypted: no
Page size: 612 x 792 pts (letter)
File size: 55418 bytes
Optimized: yes
PDF version: 1.6
As you can see, there are no [] characters in here.

bodikp · June 25, 2009, 8:37pm

Brian C. wrote:

Alex wrote:

The index method of the array class is a method just like any other, so
you
could just as well write it like:

foo..

which calls the []' method on the result of the previous call of the[]’
method, which is an array.

foo[0] is just a shortcut ruby gives you for calling foo.

Or in this particular case, you can do

foo.first.first

Well, interestingly, I’ve succeeded in some of my scripts. But, in this
one, it fails. It displays a few hundred filenames with page counts,
but, in this directory, there are literally thousands of PDF files. So,
it does a bunch, then dies.

Dir.glob("*.pdf").each do |pdffile|
pages = pdfinfo #{pdffile}
pages = pages.scan(/^Pages:[ ]{2,99}([0-9]+)/)
pages = pages[0][0].to_i
puts “#{pdffile} #{pages}”
end

I get:
================ NoMethodError =====================
C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:8:in `block in

' pages = pages[0][0].to_i C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `each' Dir.glob("*.pdf").each do |pdffile| C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `' Dir.glob("*.pdf").each do |pdffile|

=============================================
Exception: undefined method `[]’ for nil:NilClass

bodikp · June 26, 2009, 3:45pm

Hi –

On Fri, 26 Jun 2009, Peter B. wrote:

C:\Users\pb4072\Documents\scripts\RUBY\multitiffs.rb:5:in `each’

David. You say to do this:
Tagged: no
Pages: 1
Encrypted: no
Page size: 612 x 792 pts (letter)
File size: 55418 bytes
Optimized: yes
PDF version: 1.6
As you can see, there are no [] characters in here.

Sorry, I was spacing out and remembering (wrongly) that you were
looking for [] characters. I think my brain was leading me astray by
images of dvips output and such.

Anyway… here’s the pages[] technique in action:

pages = “Pages: 1234”
=> “Pages: 1234”

pages[/\D+(\d+)/,1]
=> “1234”

When you subscript a string with a regex like that, it matches it
against the string, and if you provide a number, it returns only the
corresponding parenthetical match. Another example:

“David A. Black”[/\S+ (\S+) (\S+)/,2] # “Black”

David

How do I get an integer from an array?

?

1 2 3 4 5 6 7… 78

1
2
3
4
5
6
7…
78