Problem with ".scan"

bodikp · September 27, 2006, 4:52pm

RUBY’s complaining about the following 3 lines of code. I’ve got it in a
new program, but, I copied it directly from an older, working program.
Can someone help me understand what’s the problem with the “scan” line,
or, apparently, the “each” line?

Thanks,
Peter

10 Dir.glob("*.ps").each do |psfile|
11 file_contents = File.read(psfile)
12 file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do

Error message:

E:/PageCounts/test1.rb:12:in scan': string modified (RuntimeError) from E:/PageCounts/test1.rb:12 from E:/PageCounts/test1.rb:10:ineach’
from E:/PageCounts/test1.rb:10

bodikp · September 27, 2006, 5:01pm

On Wed, 27 Sep 2006, Peter B. wrote:

11 file_contents = File.read(psfile)
12 file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do

the modification is probably here. can’t you show us everything up
through
the matching end?

-a

bodikp · September 27, 2006, 5:27pm

Peter B. wrote:

11 file_contents = File.read(psfile)
12 file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do

Whoa. The error message suggests that the source string is being
modified
while being read, but your listing stops before the part where that
might
happen.

Error message:

E:/PageCounts/test1.rb:12:in `scan’: string modified (RuntimeError)

It’s always a good idea to post a short working example in a case like
this,
not just the part where you think the problem is. That would have
provided
us with the true problem area.

bodikp · September 27, 2006, 5:37pm

On Thu, 28 Sep 2006, Peter B. wrote:

-a

Sorry. It’s a bit much. That’s why I was holding back. Here’s the whole
script.

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << “%%Blank page for Asura.\n%%Page:
^^
^^
^^
^^
the modification is question
#{newtotalpages.to_i}\nshowpage\n”
File.open(psfile, “w”) { |f| f.print file_contents }
FileUtils.touch(psfile)
end

so, ruby is correct, you are modifying a string while in an in-progress
scan
block. easy-cheasy.

kind regards.

-a

bodikp · September 27, 2006, 8:01pm

On Thu, 28 Sep 2006, Peter B. wrote:

Dir.glob("*.ps").each do |psfile|
the modification is question

-a

Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I’ve been using for 6 months,
and, that part of it is just as you see it above.

probably because totalpages is always 1 - it’s never even - in your new
script
the number of pages is always 2 (or 0) i’m guessing, and so the bug is
triggered. if i we’re you i’d update the other script - it’s a bug in
waiting.

regards.

-a

bodikp · September 27, 2006, 7:45pm

unknown wrote:

On Thu, 28 Sep 2006, Peter B. wrote:

-a

Sorry. It’s a bit much. That’s why I was holding back. Here’s the whole
script.

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << “%%Blank page for Asura.\n%%Page:
^^
^^
^^
^^
the modification is question
#{newtotalpages.to_i}\nshowpage\n”
File.open(psfile, “w”) { |f| f.print file_contents }
FileUtils.touch(psfile)
end

so, ruby is correct, you are modifying a string while in an in-progress
scan
block. easy-cheasy.

kind regards.

-a

Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I’ve been using for 6 months,
and, that part of it is just as you see it above.

bodikp · September 27, 2006, 5:21pm

unknown wrote:

On Wed, 27 Sep 2006, Peter B. wrote:

11 file_contents = File.read(psfile)
12 file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do

the modification is probably here. can’t you show us everything up
through
the matching end?

-a

Sorry. It’s a bit much. That’s why I was holding back. Here’s the whole
script.

require ‘kirbybase’
Dir.chdir(“E:/pagecounts”)
#First, create the database table.
db = KirbyBase.new

If table exists, delete it.

db.drop_table(:pageinfo) if db.table_exists?(:pageinfo)
pageinfo_tbl = db.create_table(:pageinfo,
:filename, {:DataType=>:String,
:Index=>1},
:lconstant, :String,
:compcode, :String,
:primecode, :Integer,
:costcenter, :String,
:acctgroup, :Integer,
:blank, :String,
:description, :String,
:pagecount, :Float,
:sjccode, :String,
:fullname, {:DataType=>:String,
:Index=>2}
)

Import the csv file.

pageinfo_tbl.import_csv(‘McArdle_indexes.csv’)

=begin
Parse each postscript print file in the polled directory. Create
variables for:
the number of pages in each file; the number of blank pages in each
file; and,
what exact pages are blank.
=end
Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/%%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << “%%Blank page for Asura.\n%%Page:
#{newtotalpages.to_i}\nshowpage\n”
File.open(psfile, “w”) { |f| f.print file_contents }
FileUtils.touch(psfile)
end

=begin
Find blank pages in the postscript file. Look for the regular expression
that
sees a page callout followed by postscript data that does not include
data in parentheses. Any type on a postscript page is enclosed in
parentheses,
so, that’s why this is a legitimate search. Blank pages have no
parenthesized
data.
=end
blanks = []
file_contents.scan(/%%Page: [()0-9{1,5}]
([0-9]{1,5})\n[^(.*)]%%Page/)
do |match|
blanks.push($1)
end
file_contents.scan(/%%Blank page for Asura.\n/) do |match|
blanks.push(totalpages.to_i + 1)
end

=begin
Open a “pageinfo” file. Put page information about the file into it.
Notice that the variable for the total number of pages differs depending
on whether a “newtotalpages” variable exists. And, that variable only
exists if the original page count was odd and a blank had to be added.
=end
filename = File.basename("#{psfile}", ‘.ps’)
pageinfofile = File.basename("#{psfile}", ‘.ps’) + “.pageinfo”
File.open(“E:/pagecounts/#{pageinfofile}”, “a”) do |fileinfo|
if newtotalpages then
fileinfo << #{filename}\n << “Total number of pages in this PDF:
#{newtotalpages}\n” <<
“Number of blank pages in this PDF: #{blanks.size}\n” <<
"Specific pages that are blank in this PDF: " <<
“#{blanks.join(’, ')}\n”
else
fileinfo << #{filename}\n <<
“Total number of pages in this PDF: #{totalpages}\n” <<
“Number of blank pages in this PDF: #{blanks.size}\n” <<
"Specific pages that are blank in this PDF: " <<
“#{blanks.join(’, ')}\n”
end
end
end
end

=begin
Back to the database table. . . .
Query against the table and match the filename in the directory with
whichever entry
in the “filename” column of the table matches. Then, if there’s a match,
populate
the “pagecount” field in that row of the table with the variable for the
page count, as
found above. That variable name is “newtotalpages.”
=end

Dir.glob("*.ps").each do |dirfile|
result = pageinfo_tbl.select(:filename) { |r| dirfile =~
Regexp.new(r.filename) }
pageinfo_tbl.update { |r| r.name ==
{filename}.set(:pagecount=>#{newtotalpages}) } unless result.nil?
end

bodikp · September 27, 2006, 8:27pm

On Thu, 28 Sep 2006, Peter B. wrote:

Well, I know that they’re not always odd or even. They’ve been a mix of
both. But, I understand what you’re saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I’m going to do a scan of a file, open the file, scan it, and then
close it. Right?

yup. just remember to avoid this

string = ‘foobar’

string.scan(%r/foo/) do |word|
string << ‘foo’ # can’t modify while scanning
end

regards.

-a

bodikp · September 27, 2006, 9:38pm

unknown wrote:

On Thu, 28 Sep 2006, Peter B. wrote:

Well, I know that they’re not always odd or even. They’ve been a mix of
both. But, I understand what you’re saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I’m going to do a scan of a file, open the file, scan it, and then
close it. Right?

yup. just remember to avoid this

string = ‘foobar’

string.scan(%r/foo/) do |word|
string << ‘foo’ # can’t modify while scanning
end

regards.

-a

Thanks a lot, -a! I’ve cleaned up my code. But, if you notice way above,
I’ve got a File.read in the line before the file scan. If I do an “end”
for the file scan, my “read” is still open, right? Meaning, I can still
do stuff to the open file.

bodikp · September 27, 2006, 10:04pm

Peter B. [email protected] wrote:

Thanks a lot, -a! I’ve cleaned up my code. But, if you notice way
above, I’ve got a File.read in the line before the file scan. If I do
an “end” for the file scan, my “read” is still open, right? Meaning,
I can still do stuff to the open file.

If you’re referring to your original code, then no. You use
File.read(name)
which returns the whole file in a single string. No open connection is
returned.

Btw, for efficiency reasons if your files are large you might consider
using

File.foreach(file_name) do |line|
…
end

Or use File.readlines instead of File.read - that way you get an array
with
lines and not the whole file in one piece.

Kind regards

robert

bodikp · September 27, 2006, 10:34pm

Robert K. wrote:

Peter B. [email protected] wrote:

Thanks a lot, -a! I’ve cleaned up my code. But, if you notice way
above, I’ve got a File.read in the line before the file scan. If I do
an “end” for the file scan, my “read” is still open, right? Meaning,
I can still do stuff to the open file.

If you’re referring to your original code, then no. You use
File.read(name)
which returns the whole file in a single string. No open connection is
returned.

Btw, for efficiency reasons if your files are large you might consider
using

File.foreach(file_name) do |line|
…
end

Or use File.readlines instead of File.read - that way you get an array
with
lines and not the whole file in one piece.

Kind regards
robert

Thanks, Robert. I’ll look into that line-by-line technique. The reason I
probably haven’t used it is that I often need to search for or
accommodate data that spans over multiple lines.

bodikp · September 28, 2006, 8:51am

Peter B. [email protected] wrote:

Thanks, Robert. I’ll look into that line-by-line technique. The
reason I probably haven’t used it is that I often need to search for
or accommodate data that spans over multiple lines.

Yeah, in that case File.read is clearly superior (if the file fits into
memory that is). For me line by line is the default because it scales
better
and I switch only to slurp in at once if I need line spanning. But then
again my typical problem might be different from yours so your different
default might actually be the better solution for you.

Kind regards

robert

bodikp · September 27, 2006, 8:18pm

unknown wrote:

On Thu, 28 Sep 2006, Peter B. wrote:

Dir.glob("*.ps").each do |psfile|
the modification is question

-a

Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I’ve been using for 6 months,
and, that part of it is just as you see it above.

probably because totalpages is always 1 - it’s never even - in your new
script
the number of pages is always 2 (or 0) i’m guessing, and so the bug is
triggered. if i we’re you i’d update the other script - it’s a bug in
waiting.

regards.

-a

Well, I know that they’re not always odd or even. They’ve been a mix of
both. But, I understand what you’re saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I’m going to do a scan of a file, open the file, scan it, and then
close it. Right?