UTF-8 aware chop for 1.8?

Hello,

Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?

Thanks,
Ammar

Ended up making my own. Posting it here for the benefit of others, and
maybe some feedback.

https://gist.github.com/661217

Regards,
Ammar

On Nov 3, 2010, at 9:08 AM, Ammar A. wrote:

Is there an easy way to chop (as in String#chop) a string that can
potentially contain UTF-8 in ruby 1.8? Or should I roll my own?

Well, it should be this simple:

str.gsub(/.\z/mu, “”)

James Edward G. II

On Wed, Nov 3, 2010 at 5:57 PM, James Edward G. II
[email protected] wrote:

Well, it should be this simple:

str.gsub(/.\z/mu, “”)

On Wed, Nov 3, 2010 at 6:04 PM, Adam P. [email protected]
wrote:

s.gsub(/^(.+)./u) { $1 }
=> “one two thre”

Beautiful. Thank you both.

It was a god exercise for me, so I don’t necessarily feel that I
wasted 30 minutes of my life :slight_smile:

By the way, the m options seems superfluous in James’ version. I get
the same results without it.

Thanks again,
Ammar

On Nov 3, 2010, at 11:33 AM, Ammar A. wrote:

Beautiful. Thank you both.

It was a god exercise for me, so I don’t necessarily feel that I
wasted 30 minutes of my life :slight_smile:

By the way, the m options seems superfluous in James’ version. I get
the same results without it.

It’s not:

“\n”.sub(/.\z/u, “”)
=> “\n”

“\n”.sub(/.\z/mu, “”)
=> “”

Using gsub() over sub() was a dumb mistake on my part though. sub() is
all you need, since it can only match once.

James Edward G. II

I was going to say

$KCODE=“U”
=> “U”

s = “one two three”
=> “one two three”

s.gsub(/^(.+)./u) { $1 }
=> “one two thre”

I guess I overthought it, huh!

On Wed, Nov 3, 2010 at 6:38 PM, James Edward G. II
[email protected] wrote:

Using gsub() over sub() was a dumb mistake on my part though. sub() is all you
need, since it can only match once.

Thanks for the clarification.

My method now looks like:

def chop_utf8(s)
return unless s

lead = s.sub(/.\z/mu, “”)
last = s.scan(/.\z/mu).first
last = ‘’ unless last

[lead, last]
end

Short and sweet.

Cheers,
Ammar

On Nov 3, 2010, at 11:56 AM, Ammar A. wrote:

My method now looks like:

def chop_utf8(s)
return unless s

lead = s.sub(/.\z/mu, “”)
last = s.scan(/.\z/mu).first
last = ‘’ unless last

The two lines above can be replaced with the more efficient:

last = s[/.\z/mu] || ‘’

[lead, last]
end

James Edward G. II

On Wed, Nov 3, 2010 at 7:00 PM, James Edward G. II
[email protected] wrote:

The two lines above can be replaced with the more efficient:

last = s[/.\z/mu] || ‘’

At this rate the method is going to disappear. :slight_smile:

I updated the gist accordingly:

https://gist.github.com/661257

Thanks again,
Ammar

On Thu, Nov 4, 2010 at 1:25 AM, Ammar A. [email protected] wrote:

On Wed, Nov 3, 2010 at 7:00 PM, James Edward G. II

last = s[/.\z/mu] || ‘’
I updated the gist accordingly:
https://gist.github.com/661257

can we make that a one pass?

str =~ /.\z/mu
[$`,$&]

best regards -botp

Ammar A. wrote in post #959047:

By the way, the m options seems superfluous in James’ version. I get
the same results without it.

foo = “abc\n”
=> “abc\n”

foo.sub(/.\z/mu, ‘’)
=> “abc”

foo.sub(/.\z/u, ‘’)
=> “abc\n”

On Thu, Nov 4, 2010 at 4:37 PM, Brian C. [email protected]
wrote:

Ammar A. wrote in post #959047:

By the way, the m options seems superfluous in James’ version. I get
the same results without it.

foo = “abc\n”
=> “abc\n”

foo.sub(/.\z/mu, ‘’)
=> “abc”

foo.sub(/.\z/u, ‘’)
=> “abc\n”

James clarified this earlier. But thanks for chiming in nonetheless.

Cheers,
Ammar

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs