First and last char

botp · August 11, 2007, 7:31pm

I always get tripped when working together w arrays and strings
specially on,

string.first and string.last

of course, they err

wish there were #first and #last in String

just a thought
kind regards -botp

botp · August 11, 2007, 7:44pm

of course, they err

wish there were #first and #last in String

just a thought
kind regards -botp

irb(main):001:0> class String
irb(main):002:1> def first
irb(main):003:2> self.split(’’).first
irb(main):004:2> end
irb(main):005:1> def last
irb(main):006:2> self.split(’’).last
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> “testing”.first
=> “t”
irb(main):010:0> “testing”.last
=> “g”

botp · August 11, 2007, 9:35pm

Felix W. wrote:

of course, they err

wish there were #first and #last in String

just a thought
kind regards -botp

irb(main):001:0> class String
irb(main):002:1> def first
irb(main):003:2> self.split(’’).first
irb(main):004:2> end
irb(main):005:1> def last
irb(main):006:2> self.split(’’).last
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> “testing”.first
=> “t”
irb(main):010:0> “testing”.last
=> “g”

Ew, that’s awfully complex. You create n new objects from which you
throw n-1 away again…
Think of the memory!
def first; self[0,1]; end; def last; self[-1,1]; end

Regards
Stefan

botp · August 12, 2007, 12:07pm

Daniel DeLorme wrote:

Stefan R. wrote:

Ew, that’s awfully complex. You create n new objects from which you
throw n-1 away again…
Think of the memory!
def first; self[0,1]; end; def last; self[-1,1]; end

wrong, that’s the first and last bytes, not characters.

def first; self[/\A./m]; end
def last; self[/.\z/m]; end

$KCODE=‘u’
=> “u”

“æ—¥æœ¬èªž”.first
=> “æ—¥”

“æ—¥æœ¬èªž”.last
=> “èªž”

Daniel

You are right, your solution is better.

Regards
Stefan

botp · August 12, 2007, 3:28pm

Stefan R. wrote:

Daniel

You are right, your solution is better.

Only partly. Unfortunately, end-anchored regular expressions have pretty
abysmal performance.

require “benchmark”
str = “æ—¥æœ¬èªž”*1000
Benchmark.measure{10000.times{str.first}}.real
=> 0.0704410076141357

Benchmark.measure{10000.times{str.last}}.real
=> 5.35788202285767

Daniel

botp · August 12, 2007, 3:53am

Stefan R. wrote:

Ew, that’s awfully complex. You create n new objects from which you
throw n-1 away again…
Think of the memory!
def first; self[0,1]; end; def last; self[-1,1]; end

wrong, that’s the first and last bytes, not characters.

def first; self[/\A./m]; end
def last; self[/.\z/m]; end

$KCODE=‘u’
=> “u”

“æ—¥æœ¬èªž”.first
=> “æ—¥”

“æ—¥æœ¬èªž”.last
=> “èªž”

Daniel

botp · August 12, 2007, 4:32pm

On Aug 11, 6:52 pm, Daniel DeLorme [email protected] wrote:

Stefan R. wrote:

Ew, that’s awfully complex. You create n new objects from which you
throw n-1 away again…
Think of the memory!
def first; self[0,1]; end; def last; self[-1,1]; end

wrong, that’s the first and last bytes, not characters.

For Ruby 1.9+ it will just be:

def first; self[0]; end; def last; self[-1]; end

my own version is (basically):

def first(pattern=//)
split(pattern).at(0)
end

which is a little more versatile. but I see the point about the
memory, and I’ll add an optimization clause come 1.9.

T.

botp · August 13, 2007, 8:05am

Hi,

Am Sonntag, 12. Aug 2007, 02:43:00 +0900 schrieb Felix W.:

irb(main):005:1> def last
irb(main):006:2> self.split(’’).last
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> “testing”.first
=> “t”
irb(main):010:0> “testing”.last
=> “g”

Sometimes I wish every young programmer was forced to do a
month in Assembler and another one in C just to see what
cost in time and space some constructions cause.

Sorry, Felix!

Bertram

botp · August 13, 2007, 12:10am

Daniel DeLorme wrote:

Stefan R. wrote:

Daniel

You are right, your solution is better.

Only partly. Unfortunately, end-anchored regular expressions have pretty
abysmal performance.

require “benchmark”
str = “æ—¥æœ¬èªž”*1000
Benchmark.measure{10000.times{str.first}}.real
=> 0.0704410076141357

Benchmark.measure{10000.times{str.last}}.real
=> 5.35788202285767

Daniel

That can be helped. Assuming that there is no encoding with 1 char > 8
bytes:

def first; self[/\A./m]; end
def last; self[/.\z/m]; end
def last2; self[-8,8][/.\z/m]; end

Benchmark.measure{10000.times{str.first}}.real
=> 0.0643939971923828
Benchmark.measure{10000.times{str.last}}.real
=> 7.3151650428772
Benchmark.measure{10000.times{str.last2}}.real
=> 0.167464017868042

That’s a 40x improvement for that string. For short strings it will
probably be slightly slower, but I’d say it’s worth it.

Regards
Stefan

botp · August 13, 2007, 8:17am

Hi,

Am 13.08.2007 um 08:04 schrieb Bertram S.:

irb(main):002:1> def first
=> “g”

Sometimes I wish every young programmer was forced to do a
month in Assembler and another one in C just to see what
cost in time and space some constructions cause.

Sorry, Felix!

Well, here’s something that should be a little bit less cycle
intensive (depending on how String# is implemented):

class String
def last
self[-1].chr
end
end

Cheers

Stephan

botp · August 13, 2007, 9:07am

Hi,

Am Montag, 13. Aug 2007, 15:15:54 +0900 schrieb Stephan Kämper:

Am 13.08.2007 um 08:04 schrieb Bertram S.:

Well, here’s something that should be a little bit less cycle intensive
(depending on how String# is implemented):

class String
def last
self[-1].chr
end
end

This is what I would have implemented, too.

The difficult point is that it raises some questions:

Should it return a Fixnum or a String of lenght 1?
Should it be able to return UTF-8 characters?
Should I define String#shift and String#pop now?

I don’t recommend to discuss such question in an open forum
since I saw what happened to my String#notempty? proposal.

Bertram

botp · August 13, 2007, 12:28pm

From: botp [mailto:[email protected]]
irb(main):007:2> end

Sorry, Felix!

Bertram

–
Bertram S.
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

No problem :o)

When posting here, I tend to forget that I usually either write
throwaway
scripts (one time processing of a problem), or scripts that get run
occasionally on faily large servers. So far, considering processing time
and
memory simply stepped into the background over finding a solution
quickly -
if a script takes 2 minutes longer to run but took 5 minute fewer to
write
while solving something immediate, that’s a net win in most situations I
use
ruby in.

You’re completely right, though, it’s far from best practices.

Felix

botp · August 13, 2007, 2:26pm

On Aug 13, 12:06 am, Bertram S. [email protected] wrote:

I don’t recommend to discuss such question in an open forum
since I saw what happened to my String#notempty? proposal.

Sad for Ruby.

T.

botp · August 13, 2007, 2:40pm

Hi –

On Mon, 13 Aug 2007, Bertram S. wrote:

self[-1].chr
end
end
This is what I would have implemented, too.

The difficult point is that it raises some questions:

Should it return a Fixnum or a String of lenght 1?

If it ever gets added to Ruby, it will presumably be in 1.9/2.0, where
str[x] gives you a character anyway. If it doesn’t get added, then
everyone will write their own, hopefully in a safe way, and can do
whatever they like

Should it be able to return UTF-8 characters?

Should I define String#shift and String#pop now?

There’s already #chop. I don’t know whether there are plans for
#lchop or equivalent.

David

botp · August 13, 2007, 2:43pm

Hi –

On Mon, 13 Aug 2007, Felix W. wrote:

From: botp [mailto:[email protected]]
irb(main):007:2> end

You’re completely right, though, it’s far from best practices.
If it results in a net win, then it sounds like it is a best practice.
Don’t worry; there will be plenty of opportunity for performance
examination and critique, where it matters.

Assembler is really cool, though. Definitely worth a look

David