String#chop chops last byte, not char

This is happening in ruby 1.8.6:

% ruby --version
ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-darwin9.1.0]

Documentation says:

% ri String#chop
…Returns a new +String+ with the last character removed. If…

This is irb and Rails script/console sessions:

% ./script/console
Loading development environment (Rails 2.0.2)

“абвгд”.chop
=> “абвг\320”

“абвгд”.chop.chop
=> “абвг”

% irb

“абвгд”.chop
=> “\320\260\320\261\320\262\320\263\320”

“д”.chop
=> “\320”

“д”.chop.chop
=> “”

As you can see chop removes last byte vs last char. Btw, problem
happened while stuffing strings into legacy (ms sql server) db,
varchar(255) column. I came up with the following method (see below).
Maybe there is an easier alternative?

def truncate(text, size = 254, suffix = “…”)
if text.nil? then return end
l = size - suffix.size
if text.size > size
truncated_text = “”
text.each_char do |c|
if truncated_text.size + c.size < l
truncated_text << c
else
break
end
end
truncated_text += suffix
else
text
end
end

I did not report this as a bug, since I am not sure who is supposed to
be right - ruby or documentation.

On Wed, 23 Apr 2008 15:35:11 -0500, Evgeni Belin wrote:

This is happening in ruby 1.8.6:

% ruby --version
ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-darwin9.1.0]

“д”.chop
=> “\320”

“д”.chop.chop
=> “”

As you can see chop removes last byte vs last char. Btw, problem

Ruby 1.8 does not have multi-byte character support built in, so it
assumes each character is one byte. If you would like unicode support,
include in your scripts

$KCODE=‘u’
require ‘jcode’

Then chop will work properly. (Though I’m not sure everything will be
perfect)

Ruby 1.9 has unicode support built in and handles this properly out of
the box.

Evgeni Belin wrote:

This is happening in ruby 1.8.6:

In Ruby 1.8.6, character == byte basically everywhere. You’ll need to
use a regex to remove the last character safely (UTF-8).

  • Charlie