How do you do this

georgeuoa · October 1, 2009, 2:23pm

Given an array of strings e.g.
x = [“abc”,“abcde” “def”,“xyzwj”] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = [“abc”,“def”]
x2 = [“abcde”,“xyzwj”]

Thank you.

georgeuoa · October 1, 2009, 2:34pm

George G. wrote:

Given an array of strings e.g.
x = [“abc”,“abcde” “def”,“xyzwj”] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = [“abc”,“def”]
x2 = [“abcde”,“xyzwj”]

Thank you.

y = {}
x.each do |v|
y[v.length] || = []
y[v.length] << v
end
y.values

or if you prefer less lines…

x.inject({}) do |h, v|
(y[v.length] || = []) << v
h
end.values

georgeuoa · October 1, 2009, 2:36pm

On Thu, Oct 1, 2009 at 1:23 PM, George G.
[email protected] wrote:

Given an array of strings e.g.
x = [“abc”,“abcde” “def”,“xyzwj”] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

Well, here’s something close:

h = {}
x.each do |i|
h[i.length] ||= []
h[i.length] << i
end

h is now a hash: {3=>[“abc”, “def”], 5=>[“abcde”, “xyzwj”]}

That’s close enough to what you want that I’m sure you can run with
it. Look in the “Group by unique entries of a hash” thread for more
ideas.

x1 = [“abc”,“def”]
x2 = [“abcde”,“xyzwj”]

Thank you.

Posted via http://www.ruby-forum.com/.

–
Paul S.
http://www.nomadicfun.co.uk

[email protected]

georgeuoa · October 1, 2009, 2:37pm

On Thu, Oct 1, 2009 at 2:23 PM, George G.
[email protected] wrote:

Given an array of strings e.g.
x = [“abc”,“abcde” “def”,“xyzwj”] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = [“abc”,“def”]
x2 = [“abcde”,“xyzwj”]

You might want to look at group_by:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

Jesus.

georgeuoa · October 1, 2009, 2:39pm

On Thu, Oct 1, 2009 at 1:34 PM, Ilan B. [email protected] wrote:

y = {}
x.each do |v|
y[v.length] || = []
y[v.length] << v
end
y.values

LOL I love Ruby and Rubytalk

or if you prefer less lines…

x.inject({}) do |h, v|
(y[v.length] || = []) << v
h
end.values

Must… master… inject…

–
Paul S.
http://www.nomadicfun.co.uk

[email protected]

georgeuoa · October 1, 2009, 2:40pm

2009/10/1 Jesús Gabriel y Galán [email protected]:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

I really can’t believe Ruby sometimes. This is so freaking awesome!

Must get back to real work…

Paul S.
http://www.nomadicfun.co.uk

[email protected]

georgeuoa · October 1, 2009, 5:05pm

On Thu, Oct 1, 2009 at 9:23 PM, George G.
[email protected] wrote:

Given an array of strings e.g.
x = [“abc”,“abcde” “def”,“xyzwj”] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = [“abc”,“def”]
x2 = [“abcde”,“xyzwj”]

Thank you.

p x.map{|a| a.length}.uniq.map{|b| x.select{|c| c.length == b}}

#> [[“abc”, “def”], [“abcde”, “xyzwj”]]

Harry

georgeuoa · October 1, 2009, 2:41pm

Hi,

Am Donnerstag, 01. Okt 2009, 21:23:30 +0900 schrieb George G.:

Given an array of strings e.g.
x = [“abc”,“abcde” “def”,“xyzwj”] and of different lengths,
how can you efficiently create new arrays of strings which are of the
same length. for example the above array can be transformed into

x1 = [“abc”,“def”]
x2 = [“abcde”,“xyzwj”]

x = %w(abc abcde def xyzwj)
x.inject( Hash.new { |h,k| h[k] = [] }) { |h,e| h[e.length].push e ; h
}

Bertram

georgeuoa · October 1, 2009, 5:23pm

Hi –

On Thu, 1 Oct 2009, JesÃºs Gabriel y GalÃ¡n wrote:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

It’s interesting how often the need for group_by without the keys
comes up. Meaning, in this case, to get the new arrays you’d
ultimately do:

arr.group_by(&:length).values

and I believe there was at least one similar case mentioned here
recently. I wonder whether it would be cool to have a method that did
this – in effect:

module Enumerable
def group_by_without_keys(&block)
group_by(&block).values
end
end

I’m not sure what it should be called, though.

David

georgeuoa · October 1, 2009, 5:42pm

On Thu, Oct 1, 2009 at 5:20 PM, David A. Black [email protected]
wrote:

Hi –

On Thu, 1 Oct 2009, Jesús Gabriel y Galán wrote:

%w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

It’s interesting how often the need for group_by without the keys
comes up. Meaning, in this case, to get the new arrays you’d
ultimately do:

arr.group_by(&:length).values

Yup, although in this case, I’m going to guess that he will either

Access the list of words of a specific number

desired_length = something
a = %w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}

a[desired_length]

Sort the groups by length

a = %w{abc ads adfdf adfdw fefm mfekmw fmdms}.group_by {|x| x.length}
a.sort.map {|x| x[1]} # or something

I’m not sure what it should be called, though.
values_grouped_by
?

Jesus.

georgeuoa · October 1, 2009, 9:22pm

On Oct 1, 2009, at 05:34 , Ilan B. wrote:

(y[v.length] || = []) << v
h
end.values

Syntax error in both cases. It needs to be “||=”, not “|| =”.

Well… inject ALWAYS loses, but fanboys sure seems to like it for no
good reason.

By using better names and the right tool for the job, this becomes a
LOT more readable, maintanable, and faster all in one fell swoop:

by_length = Hash.new { |h,k| h[k] = [] }
strings.each do |string|
by_length[string.length] << string
end
by_length.values # I think this part is a mistake, but I wanted to match

I think the readability is more important than speed by a long shot…
But just in case you’re not convinced, check out the benchmarks:

% ./blah.rb 10000

of iterations = 10000

                       user     system      total        real

null_time 0.000000 0.000000 0.000000 ( 0.001370)
mine 7.790000 0.050000 7.840000 ( 7.869737)
yours-inject 15.170000 0.050000 15.220000 ( 15.554334)
yours-each 11.850000 0.100000 11.950000 ( 12.013553)

inject is twice as slow as mine. stop using it.

georgeuoa · October 1, 2009, 11:12pm

On Thu, Oct 1, 2009 at 2:21 PM, Ryan D. [email protected]
wrote:

by_length.values # I think this part is a mistake, but I wanted to match
yours-each 11.850000 0.100000 11.950000 ( 12.013553)

inject is twice as slow as mine. stop using it.

I generalized yours, and made the returned groups sorted by the results
from
the call. In this more comparable situation, inject is about 11% slower,
not
twice as slow.

Inject Test
Rehearsal --------------------------------------------------
Without Inject 14.160000 0.100000 14.260000 ( 14.364824)
With Inject 15.950000 0.120000 16.070000 ( 16.258609)
---------------------------------------- total: 30.330000sec

                 user     system      total        real

Without Inject 14.200000 0.110000 14.310000 ( 14.553592)
With Inject 16.000000 0.120000 16.120000 ( 16.422186)

Inject is about 11.38% slower

Here is the code:

#!/usr/bin/env ruby
require ‘benchmark’

class Symbol
def to_proc
Proc.new{|obj| obj.send self } # give 1.9ish syntax
end
end

module Enumerable

def group_by_without_inject( &get_key )
groups = Hash.new { |h,k| h[k] = Array.new }
each do |obj|
groups[ get_key[obj] ] << obj
end
groups.keys.sort!.map!{|key| groups[key] }
end

def group_by_with_inject( &get_key )
groups = inject Hash.new{ |h,k| h[k] = Array.new } do |groups,obj|
groups[ get_key[obj] ] << obj
groups
end
groups.keys.sort!.map!{|key| groups[key] }
end

end

puts “Inject Test”
benchmarks = Benchmark.bmbm do|b|
x = [“abc”,“abcde”,“def”,“xyzwj”]

b.report(“Without Inject”) do
500_000.times{ x.group_by_without_inject &:length }
end

b.report(“With Inject”) do
500_000.times{ x.group_by_with_inject &:length }
end
end

benchmarks.map!{|b| b.real }
percent_slower = sprintf( “%.2f” , 100 - 100 * benchmarks.first /
benchmarks.last )
puts ‘’ , “Inject is about #{ percent_slower }% slower”

georgeuoa · October 1, 2009, 7:05pm

On Fri, 2 Oct 2009, JesÃºs Gabriel y GalÃ¡n wrote:

Â arr.group_by(&:length).values

Yup, although in this case, I’m going to guess that he will either

Access the list of words of a specific number

Sort the groups by length

Yes, that’s pretty likely. I guess to make the non-keyed version
useful it would have to do some kind of automatic sorting, like you
did in your example:

module Enumerable
def my_group_by(&block)
g = group_by(&block)
g.sort.map(&:last)
end
end

or something. (I’m trying to be a good 1.9 citizen and use
Symbol#to_proc, even though I still find it a bit line-noisy

And that wouldn’t handle the specific number case, of course. Maybe
it’s not all that useful.

David

georgeuoa · October 1, 2009, 11:27pm

On Thu, Oct 1, 2009 at 6:38 AM, Paul S. [email protected]
wrote:

Must… master… inject…

I think it’s much more readable to build hashes with something like:

h = {}
blah.each do |v|
h[…] = …
end

than:

blah.inject({}) do |h, v|
h[…] = …
h
end

Gratuitous use of inject FTL. Ruby isn’t an immutable state functional
language.

georgeuoa · October 2, 2009, 1:43am

On Oct 1, 2009, at 14:11 , Josh C. wrote:

I generalized yours, and made the returned groups sorted by the
results from
the call. In this more comparable situation, inject is about 11%
slower, not
twice as slow.

This “more comparable” situation is full of bugs and isn’t comparable.

Yes, I should have said “your [ilan’s] inject version is twice as slow
as mine” instead of “inject is twice as slow as mine” but my numbers
still stand. If you use the right tool for the job and it’ll pay off
in both maintainability and speed.

Your version isn’t maintainable, has bugs(*) and obfuscates a ton,
missing my point entirely. Simpler code wins HANDS DOWN. As I said the
first time: “I think the readability is more important than speed by a
long shot”. FWIW, my results running your code as-is was exactly 2x
yours (22% slower, not 11% slower).

*) calling sort! within a law of demeter violation is ALWAYS a bug.
*) calling (almost) any bang method on a temporary value is usually a
bug.

georgeuoa · October 2, 2009, 2:38am

Hi –

On Fri, 2 Oct 2009, Tony A. wrote:

h[…] = …
end

than:

blah.inject({}) do |h, v|
h[…] = …
h
end

I agree, and I think that’s it’s nice that 1.9 provides
Enumerator#with_object, which lets you avoid that explicit feeding of
the accumulator back into the loop.

David

georgeuoa · October 2, 2009, 3:28am

On Thu, Oct 1, 2009 at 6:38 PM, Ryan D. [email protected]
wrote:

Your version isn’t maintainable, has bugs(*) and obfuscates a ton, missing
my point entirely. Simpler code wins HANDS DOWN. As I said the first time:
“I think the readability is more important than speed by a long shot”.

Perhaps, but it is simpler because it is too specific to to the given
problem. Once it must be rewritten in several different places, it is no
longer more maintainable. It will also clutter the code, making it less
readable.

*) calling sort! within a law of demeter violation is ALWAYS a bug.

That is fair.

*) calling (almost) any bang method on a temporary value is usually a bug.

Why is that?

georgeuoa · October 2, 2009, 4:35am

On Oct 1, 8:27 pm, Josh C. [email protected] wrote:

On Thu, Oct 1, 2009 at 6:38 PM, Ryan D. [email protected] wrote:

*) calling (almost) any bang method on a temporary value is usually a bug.

Why is that?

What’s the point of calling a bang method there? You don’t care at all
about the object you’re mutating, and it seems like a blatant case of
premature optimization.

Now, consider the specific cases where the bang method doesn’t return
the same value as the non-bang (viz. uniq!).

georgeuoa · October 2, 2009, 4:44am

On Thu, Oct 1, 2009 at 9:34 PM, Yossef M.
[email protected]wrote:

premature optimization.

Now, consider the specific cases where the bang method doesn’t return
the same value as the non-bang (viz. uniq!).

–
-yossef

I see, thank you.

georgeuoa · October 2, 2009, 2:00pm

Hi –

On Fri, 2 Oct 2009, Yossef M. wrote:

On Oct 1, 8:27Â pm, Josh C. [email protected] wrote:

On Thu, Oct 1, 2009 at 6:38 PM, Ryan D. [email protected] wrote:

*) calling (almost) any bang method on a temporary value is usually a bug.

Why is that?

What’s the point of calling a bang method there? You don’t care at all
about the object you’re mutating, and it seems like a blatant case of
premature optimization.

I disagree, at least on the second point. Using transparent language
constructs that happen to be more efficient in some dimension doesn’t
mean you’re prematurely optimizing.

For example, if I do this:

obj.meth

instead of this:

obj.send(:meth)

you could say I’m prematurely optimizing. Both exist in the language;
any non-beginning Rubyist knows about both; both fly from the fingers
quite readily. So if I choose the one that’s faster, I’m “optimizing”
– but still, unless I have no choice, I’ll choose it.

Same thing with things like map and map!. Actually I don’t
automatically reach for the in-place ones (I’m just not in the habit),
but they’re fully visible and trivially usable at the language level,
so I don’t think they can be thought of as sidetracking the programmer
into doing too much too soon about too little. In other words, all
else being equal, it isn’t necessarily bad to choose one idiom over
another in a high level language on the grounds that the one you’re
choosing is likely to shave a few cycles. If you forbid yourselves the
slightly faster ones simply because they are slightly faster, that way
lies having to test everything you write to make sure it ISN’T
performing as well as it could, so that you can’t be accused of having
prematurely optimized

What might be bad is if you get a hunch, based on no evidence, that it
would be efficient to divide every array into four subarrays before
doing a mapping, and then recombining them, and go around changing
your code so that every array is handled that way… That, I think,
is the kind of thing where you’re looking for trouble before you know
it’s there.

Of course, a certain amount of how this plays out in Ruby has to do
with Ruby being a high-level language. We can make a certain number of
choices like map vs. map! trivially, even though if we were
implementing the methods themselves, we’d be faced with a whole new
round of optimization issues and a much more fine-grained code
texture.

I tend to agree too with the interpretations of the Hoare statement
that suggest that not all optimization during development is
“premature”. It depends on what you’re writing, and on how hard it is
likely to be to go back and deal with optimizations and bottlenecks
later. I think Ryan has said “Performance isn’t an issue until it’s an
issue”, which I agree with, but I’d add that it’s not necessarily
amiss to get into some habits that cost you nothing in developer time,
do not derail you from your basic workflow, but do buy you a bit of
efficiency here and there.

Now, consider the specific cases where the bang method doesn’t return
the same value as the non-bang (viz. uniq!).

Of course you wouldn’t want to risk calling a wrong method on nil, but
that’s not an argument against using map! or sort! Just remember to
read the documentation carefully. After all, ! means “dangerous” so
You Have Been Warned

David

P.S. I’ve found this article very interesting on this score: