Efficiency of string parsing


#1

I have written a loop to basically parse a string, and at every 50th
character check to see if is a space, if not, work back until it
finds one, then insert a newline. I am turning masses of text (copy)
from a DB into images, and I just wanted to automate it, I was just
wondering if there are better ways of achieving what I am trying to
do.

    characterCount = 0
    positionCount = 0
    insertPoint = MAX_LINE_LENGTH

    while characterCount != copy.length
      characterCount += 1
      positionCount += 1
      if positionCount == MAX_LINE_LENGTH
        begin
          characterCount -= 1
          insertPoint -= 1
        end until copy[characterCount].eql?(ASCII_SPACE)
        copy.insert(characterCount+=1,'\n')
        imageHeight += LINE_HEIGHT
        positionCount = 0
      end

    end

Cheers,
Kev


#2

On 12.03.2007 16:23, Kev wrote:

        positionCount = 0
      end

    end

There are quite a lot of posts about word wrapping which seems what you
are trying to do. You should be able to find them via the archives
(Google G., ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, “\1\n”)

Kind regards

robert


#3

On 3/12/07, Robert K. removed_email_address@domain.invalid wrote:

There are quite a lot of posts about word wrapping which seems what you
are trying to do. You should be able to find them via the archives
(Google G., ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, “\1\n”)

And here’s the start of a more sophisticated approach I just whipped up.

It uses split on a word boundary to split the string. It has some
option keywords which allow preserving all whitespace, or only at the
beginning of a line. If you don’t preserve all whitespace, it
collapses whitespace within a line to a single space. If you don’t
preserve whitespace at the beginning of a line, it elminates it,
otherwise it keeps it as is. The default is to only preserve
whitespace at the beginning of a line.

It does have a few bugs, which I didn’t bother addressing and leave as
an exercise ot the reader.

  1. It ignores existing new lines in the input string, which means that
    the next line will be short.

  2. It keeps whitespace at the end of a line, as opposed to putting the
    newline after the last ‘word’.

class String
def wordwrap(linelength, kw_args={})
keep_all = kw_args[:keep_all]
keep_initial = keep_all ||kw_args[:keep_initial]
keep_initial = true if keep_initial.nil?
current_len = 0
split(/\b/).inject("") do | result, chunk |
if current_len + chunk.length >= linelength
result << “\n”
current_len = 0
chunk = “” if chunk.strip.empty? unless keep_initial
else
chunk = " " if chunk.strip.empty? unless keep_all
end
current_len += chunk.length
result << chunk
end
end
end

Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/


#4

Excellent sollution for coding efficiency. (though, I always think
Regular Expressions should be commented well (broken into parts) due
to the terseness of the syntax, especially for those who don’t use
RegEx regularly. (no pun, really)

But would a Ruby iterator be faster?

Clearly this is a tool to wrap text to 50 characters per line without
breaking words. Curious to see more ideas/approaches on that.


#5

On Mar 12, 2007, at 12:35 PM, John J. wrote:

Excellent sollution for coding efficiency. (though, I always think
Regular Expressions should be commented well (broken into parts)
due to the terseness of the syntax, especially for those who don’t
use RegEx regularly. (no pun, really)

But would a Ruby iterator be faster?

I’m just curious what it is about Ruby iterators (I assume you mean
methods like ‘each’) that you’d expect them to be more efficient than
the gsub?

Tom


#6

On Mar 13, 2007, at 2:02 AM, Tom P. wrote:

But would a Ruby iterator be faster?

I’m just curious what it is about Ruby iterators (I assume you mean
methods like ‘each’) that you’d expect them to be more efficient
than the gsub?

Iterators/callbacks using Ruby code blocks whatever.
Never said I expect them to be faster.
I was asking.
I don’t know how much text is being parsed. I do assume it is
unstructured and not indexed in any manner.
I’m just wondering if there isn’t more to know about why and what for
in order to reach the best solution for the situation.
Like they say in Perl… there’s more than 1 way right? Some ways are
just interesting, some are fast, some are useful, etc…


#7

Had to take a swipe 9^)

class String
def wrap(wrap_col)
retStr = self.dup
start = 0
while retStr[start,wrap_col].length >= wrap_col
ws_pos = retStr[start,wrap_col].rindex(" ")
break if ws_pos.nil?
retStr[ws_pos+start] = “\n”
start += ws_pos+1
end
retStr
end
end

Cheers
Chris


#8

On 12 Mar, 16:25, “Rick DeNatale” removed_email_address@domain.invalid wrote:

And here’s the start of a more sophisticated approach I just whipped up.
an exercise ot the reader.
keep_initial = keep_all ||kw_args[:keep_initial]
current_len += chunk.length
result << chunk
end
end
end

Rick DeNatale

My blog on Rubyhttp://talklikeaduck.denhaven2.com/

Being new to Ruby thats a great piece of code to get my head around,
thanks all for suggestions thoughts and ideas :slight_smile: