Forum: Ruby Efficiency of string parsing

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0f4b3fe3db6fe0c27150b32713f334dc?d=identicon&s=25 Kev (Guest)
on 2007-03-12 16:25
(Received via mailing list)
I have written a loop to basically parse a string, and at every 50th
character check to see if is a  space, if not, work back until it
finds one, then insert a newline. I am turning masses of text (copy)
from a DB into images, and I just wanted to automate it, I was just
wondering if there are better ways of achieving what I am trying to
do.

        characterCount = 0
        positionCount = 0
        insertPoint = MAX_LINE_LENGTH

        while characterCount != copy.length
          characterCount += 1
          positionCount += 1
          if positionCount == MAX_LINE_LENGTH
            begin
              characterCount -= 1
              insertPoint -= 1
            end until copy[characterCount].eql?(ASCII_SPACE)
            copy.insert(characterCount+=1,'\n')
            imageHeight += LINE_HEIGHT
            positionCount = 0
          end

        end

Cheers,
Kev
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-03-12 16:30
(Received via mailing list)
On 12.03.2007 16:23, Kev wrote:
>
>             positionCount = 0
>           end
>
>         end

There are quite a lot of posts about word wrapping which seems what you
are trying to do.  You should be able to find them via the archives
(Google Groups, ruby-talk archive).

A simplistic approach would probably do something like this:

str.gsub(/(.{1,50})\s+/, "\\1\n")

Kind regards

  robert
8f6f95c4bd64d5f10dfddfdcd03c19d6?d=identicon&s=25 Rick Denatale (rdenatale)
on 2007-03-12 17:27
(Received via mailing list)
On 3/12/07, Robert Klemme <shortcutter@googlemail.com> wrote:

> There are quite a lot of posts about word wrapping which seems what you
> are trying to do.  You should be able to find them via the archives
> (Google Groups, ruby-talk archive).
>
> A simplistic approach would probably do something like this:
>
> str.gsub(/(.{1,50})\s+/, "\\1\n")
>

And here's the start of a more sophisticated approach I just whipped up.

It uses split on a word boundary to split the string. It has some
option keywords which allow preserving all whitespace, or only at the
beginning of a line.  If you don't preserve all whitespace, it
collapses whitespace within a line to a single space.  If you don't
preserve whitespace at the beginning of a line, it elminates it,
otherwise it keeps it as is.  The default is to only preserve
whitespace at the beginning of a line.

It does have a few bugs, which I didn't bother addressing and leave as
an exercise ot the reader.

1) It ignores existing new lines in the input string, which means that
the next line will be short.

2) It keeps whitespace at the end of a line, as opposed to putting the
newline after the last 'word'.

class String
  def wordwrap(linelength, kw_args={})
    keep_all = kw_args[:keep_all]
    keep_initial = keep_all ||kw_args[:keep_initial]
    keep_initial = true if keep_initial.nil?
    current_len = 0
    split(/\b/).inject("") do | result, chunk |
      if current_len + chunk.length >= linelength
        result << "\n"
  current_len = 0
        chunk = "" if chunk.strip.empty? unless keep_initial
      else
  chunk = " " if chunk.strip.empty? unless keep_all
      end
      current_len += chunk.length
      result << chunk
    end
  end
end
--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
1c0cd550766a3ee3e4a9c495926e4603?d=identicon&s=25 John Joyce (Guest)
on 2007-03-12 17:36
(Received via mailing list)
Excellent sollution for coding efficiency. (though, I always think
Regular Expressions should be commented well (broken into parts) due
to the terseness of the syntax, especially for those who don't use
RegEx regularly. (no pun, really)

But would a Ruby iterator be faster?

Clearly this is a tool to wrap text to 50 characters per line without
breaking words. Curious to see more ideas/approaches on that.
4adc88eff7bf918a302b1f52a4a248b3?d=identicon&s=25 Tom Pollard (tomp)
on 2007-03-12 18:03
(Received via mailing list)
On Mar 12, 2007, at 12:35 PM, John Joyce wrote:

> Excellent sollution for coding efficiency. (though, I always think
> Regular Expressions should be commented well (broken into parts)
> due to the terseness of the syntax, especially for those who don't
> use RegEx regularly. (no pun, really)
>
> But would a Ruby iterator be faster?

I'm just curious what it is about Ruby iterators (I assume you mean
methods like 'each') that you'd expect them to be more efficient than
the gsub?

Tom
1c0cd550766a3ee3e4a9c495926e4603?d=identicon&s=25 John Joyce (Guest)
on 2007-03-12 18:37
(Received via mailing list)
On Mar 13, 2007, at 2:02 AM, Tom Pollard wrote:

>> But would a Ruby iterator be faster?
>
> I'm just curious what it is about Ruby iterators (I assume you mean
> methods like 'each') that you'd expect them to be more efficient
> than the gsub?

Iterators/callbacks using Ruby code blocks whatever.
Never said I expect them to be faster.
I was asking.
I don't know how much text is being parsed. I do assume it is
unstructured and not indexed in any manner.
I'm just wondering if there isn't more to know about why and what for
in order to reach the best solution for the situation.
Like they say in Perl... there's more than 1 way right? Some ways are
just interesting, some are fast, some are useful, etc...
23172b6630dc631a134c9bad2fec2a39?d=identicon&s=25 ChrisH (Guest)
on 2007-03-12 19:05
(Received via mailing list)
Had to take a swipe 9^)

class String
   def wrap(wrap_col)
      retStr = self.dup
      start = 0
      while retStr[start,wrap_col].length >= wrap_col
         ws_pos = retStr[start,wrap_col].rindex(" ")
         break if ws_pos.nil?
         retStr[ws_pos+start] = "\n"
         start += ws_pos+1
      end
      retStr
   end
end


Cheers
Chris
0f4b3fe3db6fe0c27150b32713f334dc?d=identicon&s=25 Kev (Guest)
on 2007-03-13 09:40
(Received via mailing list)
On 12 Mar, 16:25, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> And here's the start of a more sophisticated approach I just whipped up.
> an exercise ot the reader.
>     keep_initial = keep_all ||kw_args[:keep_initial]
>       current_len += chunk.length
>       result << chunk
>     end
>   end
> end
> --
> Rick DeNatale
>
> My blog on Rubyhttp://talklikeaduck.denhaven2.com/

Being new to Ruby thats a great piece of code to get my head around,
thanks all for suggestions thoughts and ideas :)
This topic is locked and can not be replied to.