Design problem with 'inject'

Ruby’s inject has a design that can lead to hard to find bugs. The
problem is that you don’t have to specify what is summed; the value of
the block is summed. That means ‘next’ can lead to a bug. Here’s an
example:

No problem:

arr.inject(0) do |sum, i|
sum += i
done

But suppose you need to skip some elements, so you add a ‘next’
statement.
Problem:

arr.inject(0) do |sum, i|
next if (i==3)
sum += i
done

This breaks. If i is 3, then when the next occurs, the value of the
block is nil. The value of the block is added to sum, but because “+”
isn’t defined for nil, there’s an exception. Note that this isn’t due to
the line “sum += i”; it’s due to the design of inject: the value of the
block is added to sum.

The real problem is that ‘inject’ has two semantics: 1) it adds onto sum
using an explicit “sum +=” or 2) is adds the value of the block. A
better approach would be allow only one way to accumulate. That way, you
can’t make a change that inexplicitly changes the function from one
semantics to another.

On Wed, Nov 15, 2006 at 03:46:44PM +0900, Gary B. wrote:

But suppose you need to skip some elements, so you add a ‘next’
statement.
Problem:

arr.inject(0) do |sum, i|
next if (i==3)
sum += i
done

You need to do this:

next sum if i == 3

Also:

sum += 1

This unecessarily modifies the sum.

sum + 1

Will suffice.

marcel

Problem:

arr.inject(0) do |sum, i|
next if (i==3)

next sum if (i == 3)

On 15/11/06, Gary B. [email protected] wrote:

             done

This breaks. If i is 3, then when the next occurs, the value of the


Posted via http://www.ruby-forum.com/.

I agree using next with inject can be dangerous and probably should be
avoided. You can however get round it by just returning the original
value of sum when the condition is met:
arr.inject do |sum,i|
(sum==3) ? sum : sum + i
end

Farrel

— Gary B. [email protected] wrote:

                sum += i
             done

You should use a ‘+’ instead of a ‘+=’. The return value of the block
will be assigned to the next iteration’s value for ‘sum’, so it
doesn’t make sense to modify ‘sum’ inside the block.

But suppose you need to skip some elements, so you add a ‘next’
statement.
Problem:

arr.inject(0) do |sum, i|
next if (i==3)
sum += i
done

Try it like this:

arr.inject(0) do |sum, i|
if i == 3
sum
else
sum + i
end
end

Any time you want to skip an element, simply return ‘sum’. This will
carry the value from the last iteration on over to the next
iteration, doing nothing with the current element. As far as I can
see, you should never have to use ‘next’ in an inject block, unless
you really want the intermediate result to be nil.

Luther T.


Sponsored Link

$420k for $1,399/mo.
Think You Pay Too Much For Your Mortgage?
Find Out! Save Money On Your Mortgage

Gary B. schrieb:

The real problem is that ‘inject’ has two semantics: 1) it adds onto sum
using an explicit “sum +=” or 2) is adds the value of the block.

Gary, that’s not true. The version of inject you use (with an explicit
initial value for the accumulator) works as follows:

def my_inject(initial_value)
accumulator = initial_value
self.each do |element|
accumulator = yield(accumulator, element)
end
accumulator
end

The result of the block simply becomes the next value of the
accumulator.

Regards,
Pit

On 15/11/06, Marcel Molina Jr. [email protected] wrote:

             done

You need to do this:

Will suffice.

marcel

Marcel Molina Jr. [email protected]

Whoa! That is good to know.

Farrel

Those are several good suggestions.

next sum if (i == 3)

strikes me as simple and clear. Provided you know that the block result
is what matters, you can stay out of trouble.

Simplifying the example, the trouble with inject is that

arr.inject(0) { |sum, i| sum += i }
arr.inject(0) { |sum, i| sum + i }

both produce the same result. Yuk.

Maybe it should be posted somewhere as a potential ‘gotcha’.

On 15.11.2006 07:50, Marcel Molina Jr. wrote:

             done

You need to do this:

Will suffice.

marcel

In this case I would not even resort to #next. This seems much more
straightforward:

arr.inject(0) {|sum, i| i == 3 ? sum : sum + i}

or, if you do not like the ternary operator

arr.inject(0) {|sum, i| if i == 3 then sum else sum + i end}

IMHO #next is best used if the block is /long/ and you want to short
circuit to the next iteration. If it is a short block like in this case
#next does not bring any benefits and in fact makes it more complicated
to understand - at least in my opinion.

Kind regards

robert

On Nov 15, 2006, at 8:51, Gary B. wrote:

arr.inject(0) { |sum, i| sum += i }
arr.inject(0) { |sum, i| sum + i }

both produce the same result. Yuk.

Maybe it should be posted somewhere as a potential ‘gotcha’.

I think it could be that I’m just too familiar with the whole
accumulator thing, but I’m having trouble even imagining a case where
you’d want it any other way. Why would you want those to produce
different results, and - here I can’t even guess - what would those
different results actually be?

I have a hunch that the key phrase is “Provided you know that the
block result is what matters, you can stay out of trouble.” I mean,
that’s sort of inherent in the construct. If you don’t know that,
I’d you’re going to have a lot of trouble with #inject.

m.s.

On Wed, 15 Nov 2006, Gary B. wrote:

block is nil. The value of the block is added to sum, but because “+”
isn’t defined for nil, there’s an exception. Note that this isn’t due to
the line “sum += i”; it’s due to the design of inject: the value of the
block is added to sum.

The real problem is that ‘inject’ has two semantics: 1) it adds onto sum
using an explicit “sum +=” or 2) is adds the value of the block. A
better approach would be allow only one way to accumulate. That way, you
can’t make a change that inexplicitly changes the function from one
semantics to another.

this last part is dead wrong, the semantics of inject have nothing
to do
with summing, or even accumulating anything. the semantics of
inject is
merely that it iterates an enumerable, on the first iteration it passes

|arg, enumerable_element|

to the block, and on subsequent passes

|value_of_previous_block_call, enumerable_element|

is passedy

for example

keys = %w( foo bar foobar )

index = keys.inject({}){|hash, k| hash.update k => true}

so, you see, inject itself does in fact do exactly one thing. if you
happen
to do more in the block it’s no fault of inject.

-a

On Wed, 15 Nov 2006, Gary B. wrote:

Those are several good suggestions.

next sum if (i == 3)

strikes me as simple and clear. Provided you know that the block result
is what matters, you can stay out of trouble.

I think you’re well advised to know that if you’re using inject :slight_smile:

Simplifying the example, the trouble with inject is that

arr.inject(0) { |sum, i| sum += i }
arr.inject(0) { |sum, i| sum + i }

both produce the same result. Yuk.

Maybe it should be posted somewhere as a potential ‘gotcha’.

It’s not, though; it’s just how Ruby works. When you do:

sum += i

the parser turns it into:

sum = sum + i

So you’re just reusing the identifier ‘sum’. As with all
reassignment, the variable is completely reinitialized.

David

On 11/14/06, Gary B. [email protected] wrote:

arr.inject(0) do |sum, i|
next if (i==3)
sum += i
done

Others have already posted clarifying the semantics of inject and
proposing alternate solutions, so I won’t bother with that. I just
wanted to point out another alternate:

arr.select{ |i| i != 3 }.inject(0){ |sum, i| sum + i }

I like to use block-taking functions on Enumerables like unix pipes
and let the body of each block do the simplest thing possible.

The only potential drawback to breaking the filter apart like this is
that the current implementation of ruby turns this into two loops. If
your array is large, this can be a problem. But for most cases where
you’re not doing number crunching, the inefficiency is negligible.

Just another possibility…

Jacob F.

On 15.11.2006 16:09, [email protected] wrote:

               sum += i

semantics to another.

|value_of_previous_block_call, enumerable_element|

is passedy

Small correction: if called without arguments the first invocation
will look like this:

|enumerable_element_1, enumerable_element_2|

%w{foo bar baz}.inject {|*a| p a}
[“foo”, “bar”]
[nil, “baz”]
=> nil

%w{foo bar baz}.inject(nil) {|*a| p a}
[nil, “foo”]
[nil, “bar”]
[nil, “baz”]
=> nil

This is actually useful to do something like this:

%w{foo bar baz}.inject() {|a,b| a+", "+b}
=> “foo, bar, baz”

(Efficiency is another story.)

Either variant is useful for summing and it depends on whether you need
the information that a container was empty or not:

[1,2,3].inject {|s,x| s+x}
=> 6

[].inject {|s,x| s+x}
=> nil

[1,2,3].inject(0) {|s,x| s+x}
=> 6

[].inject(0) {|s,x| s+x}
=> 0

so, you see, inject itself does in fact do exactly one thing. if you
happen to do more in the block it’s no fault of inject.

Absolutely.

Regards

robert

On 11/15/06, Gary B. [email protected] wrote:

Ruby’s inject has a design that can lead to hard to find bugs. The
problem is that you don’t have to specify what is summed; the value of
the block is summed. That means ‘next’ can lead to a bug. Here’s an
example:

There were lots of good responses to this post already, but if you’re
interested, I blogged about this a while ago.

On Thu, 16 Nov 2006, Robert K. wrote:

[nil, “foo”]
[nil, “bar”]
[nil, “baz”]
=> nil

This is actually useful to do something like this:

%w{foo bar baz}.inject() {|a,b| a+", "+b}
=> “foo, bar, baz”

wow - you learn something every day - that’s great!

cheers.

-a

Jacob F. wrote:

wanted to point out another alternate:

Just another possibility…

Jacob F.

Agree 100% on the use of select

On performance, it depends on what you’re doing. The more items you
want to exclude, the better select does.

tm is just a method to time the block

tm(“One Loop”) { a.inject { |s,i| (false) ? s+i : s } }
tm(“Two Loops”) { a.select { |x| false }.inject(0) { |s,i| s+i } }
Timing One Loop : 2 Seconds 188 Milliseconds
Timing Two Loops : 0 Seconds 327 Milliseconds

tm(“One Loop”) { a.inject { |s,i| (x%10==0) ? s+i : s } }
tm(“Two Loops”) { a.select { |x| x%10==0 }.inject(0) { |s,i| s+i } }
Timing One Loop : 3 Seconds 32 Milliseconds
Timing Two Loops : 1 Second 234 Milliseconds

tm(“One Loop”) { a.inject { |s,i| (x%2==0) ? s+i : s } }
tm(“Two Loops”) { a.select { |x| x%2==0 }.inject(0) { |s,i| s+i } }
Timing One Loop : 3 Seconds 907 Milliseconds
Timing Two Loops : 3 Seconds 188 Milliseconds

tm(“One Loop”) { a.inject { |s,i| (x%10!=0) ? s+i : s } }
tm(“Two Loops”) { a.select { |x| x%10!=0 }.inject(0) { |s,i| s+i } }
Timing One Loop : 5 Seconds 48 Milliseconds
Timing Two Loops : 5 Seconds 235 Milliseconds

tm(“One Loop”) { a.inject { |s,i| (true) ? s+i : s } }
tm(“Two Loops”) { a.select { |x| true }.inject(0) { |s,i| s+i } }
Timing One Loop : 4 Seconds 532 Milliseconds
Timing Two Loops : 5 Seconds 314 Milliseconds

Gary B. wrote:

But suppose you need to skip some elements, so you add a ‘next’
statement.
Problem:

arr.inject(0) do |sum, i|
next if (i==3)
sum += i
done

[1,2,3,4,5].reject{|n| n==3}.inject{|s,n| s+n}

----- Original Message -----
From: “Gary B.” [email protected]
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” [email protected]
Sent: Wednesday, November 15, 2006 1:46 AM
Subject: Design problem with ‘inject’

            done

This breaks. If i is 3, then when the next occurs, the value of the


Posted via http://www.ruby-forum.com/.

This works fine

sum = 0
arr = (1…10).to_a
arr.inject(0) { |sum, i| sum += ( i == 3 ? 0 : i ) } => 52

This works fine

sum = 0
arr = (1…10).to_a
arr.inject(0) { |sum, i| sum += ( i == 3 ? 0 : i ) } => 52

The assignment to sum is useless. Only the return value matters to
inject.

arr.inject(0) { |sum, i|  sum + ( i == 3 ? 0 : i ) }

Does the same thing. Actually, its more correct. The above is only
doing what you want because an assignment returns the value it assigns
to:

sum += i
sum = sum + i
sum + i  #evaluates to this after side-effect