Forum: Ruby-core [ruby-trunk - Feature #9118][Open] In Enumerable#to_a, use size to set array capa when possible

88c29512b3efa8b5a612a72003db7b74?d=identicon&s=25 HonoreDB (Aaron Weiner) (Guest)
on 2013-11-16 15:09
(Received via mailing list)
Issue #9118 has been reported by HonoreDB (Aaron Weiner).

----------------------------------------
Feature #9118: In Enumerable#to_a, use size to set array capa when
possible
https://bugs.ruby-lang.org/issues/9118

Author: HonoreDB (Aaron Weiner)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:


Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity,
then populating it and expanding the capacity as it goes. For large
enumerables, this causes several resizes, which can hurt performance.
When an enumerable exposes a size method, we can guess that the
resulting array's size will usually be equal to the enumerable's size.
If we're right, we only have to set capacity once, and if we're wrong,
we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take
advantage of the size method when it's there. In my tests this makes
Range#to_a about 10% faster, and doesn't have any significant effect on
a vanilla enum with no size method. I couldn't find any existing
benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with
custom to_a, like Hash.
D9ebdcb66f1583378e6f72155db507e2?d=identicon&s=25 Hans Mackowiak (hanmac)
on 2013-11-16 18:38
(Received via mailing list)
Issue #9118 has been updated by Hanmac (Hans Mackowiak).


enum.size can return Float::Infinity maybe for [1,2,3].cycle.size you
need to check that too
----------------------------------------
Feature #9118: In Enumerable#to_a, use size to set array capa when
possible
https://bugs.ruby-lang.org/issues/9118#change-42977

Author: HonoreDB (Aaron Weiner)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:


Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity,
then populating it and expanding the capacity as it goes. For large
enumerables, this causes several resizes, which can hurt performance.
When an enumerable exposes a size method, we can guess that the
resulting array's size will usually be equal to the enumerable's size.
If we're right, we only have to set capacity once, and if we're wrong,
we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take
advantage of the size method when it's there. In my tests this makes
Range#to_a about 10% faster, and doesn't have any significant effect on
a vanilla enum with no size method. I couldn't find any existing
benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with
custom to_a, like Hash.
88c29512b3efa8b5a612a72003db7b74?d=identicon&s=25 HonoreDB (Aaron Weiner) (Guest)
on 2013-11-16 19:12
(Received via mailing list)
Issue #9118 has been updated by HonoreDB (Aaron Weiner).


Ah, right! This seems like an opportunity to improve on existing
behavior: right now that just silently hangs forever. Do you think we
should warn, then hang, or just raise? I'd lean towards the warn because
it's possible size is returning the wrong thing.
----------------------------------------
Feature #9118: In Enumerable#to_a, use size to set array capa when
possible
https://bugs.ruby-lang.org/issues/9118#change-42978

Author: HonoreDB (Aaron Weiner)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:


Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity,
then populating it and expanding the capacity as it goes. For large
enumerables, this causes several resizes, which can hurt performance.
When an enumerable exposes a size method, we can guess that the
resulting array's size will usually be equal to the enumerable's size.
If we're right, we only have to set capacity once, and if we're wrong,
we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take
advantage of the size method when it's there. In my tests this makes
Range#to_a about 10% faster, and doesn't have any significant effect on
a vanilla enum with no size method. I couldn't find any existing
benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with
custom to_a, like Hash.
F24ff61beb80aa5f13371aa22a35619c?d=identicon&s=25 mame (Yusuke Endoh) (Guest)
on 2013-11-16 19:29
(Received via mailing list)
Issue #9118 has been updated by mame (Yusuke Endoh).


I think the proposal will break the compatibility of the following code:

  class C
    include Enumerable
    def size
      to_a.size
    end
    def each
    end
  end
  C.new.size #=> expected: 0, with the proposal: stack level too deep

Examples in the wild:

  *
https://github.com/andreasronge/neo4j/blob/c421fb8...
  *
https://github.com/ActiveRDF/ActiveRDF/blob/master...
  *
https://bitbucket.org/Bounga/tooltips/src/925ccaa9...
  * http://pastebin.com/2Sr0UXQZ


In addition, #each and #size does not necessarily have a common
semantics.
In fact, IO#each yields strings in lines, but IO#size returns a count in
bytes.

--
Yusuke Endoh <mame@tsg.ne.jp>
----------------------------------------
Feature #9118: In Enumerable#to_a, use size to set array capa when
possible
https://bugs.ruby-lang.org/issues/9118#change-42979

Author: HonoreDB (Aaron Weiner)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:


Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity,
then populating it and expanding the capacity as it goes. For large
enumerables, this causes several resizes, which can hurt performance.
When an enumerable exposes a size method, we can guess that the
resulting array's size will usually be equal to the enumerable's size.
If we're right, we only have to set capacity once, and if we're wrong,
we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take
advantage of the size method when it's there. In my tests this makes
Range#to_a about 10% faster, and doesn't have any significant effect on
a vanilla enum with no size method. I couldn't find any existing
benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with
custom to_a, like Hash.
88c29512b3efa8b5a612a72003db7b74?d=identicon&s=25 HonoreDB (Aaron Weiner) (Guest)
on 2013-11-16 19:49
(Received via mailing list)
Issue #9118 has been updated by HonoreDB (Aaron Weiner).


It definitely breaks that usage, but that's bad usage--we're supposed to
use Enumerable#count for that, not size.

In cases where size doesn't correctly predict the array, this doesn't
really break anything, it just switches out one bad guess at capa for
another.
----------------------------------------
Feature #9118: In Enumerable#to_a, use size to set array capa when
possible
https://bugs.ruby-lang.org/issues/9118#change-42980

Author: HonoreDB (Aaron Weiner)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:


Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity,
then populating it and expanding the capacity as it goes. For large
enumerables, this causes several resizes, which can hurt performance.
When an enumerable exposes a size method, we can guess that the
resulting array's size will usually be equal to the enumerable's size.
If we're right, we only have to set capacity once, and if we're wrong,
we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take
advantage of the size method when it's there. In my tests this makes
Range#to_a about 10% faster, and doesn't have any significant effect on
a vanilla enum with no size method. I couldn't find any existing
benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with
custom to_a, like Hash.
D9ebdcb66f1583378e6f72155db507e2?d=identicon&s=25 Hans Mackowiak (hanmac)
on 2013-11-17 13:27
(Received via mailing list)
Issue #9118 has been updated by Hanmac (Hans Mackowiak).


Enumerable#count may not a good idea, better would be Enumerator#size
----------------------------------------
Feature #9118: In Enumerable#to_a, use size to set array capa when
possible
https://bugs.ruby-lang.org/issues/9118#change-42986

Author: HonoreDB (Aaron Weiner)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:


Cross-post from https://github.com/ruby/ruby/pull/444.

Enumerable#to_a works by creating an empty array with small capacity,
then populating it and expanding the capacity as it goes. For large
enumerables, this causes several resizes, which can hurt performance.
When an enumerable exposes a size method, we can guess that the
resulting array's size will usually be equal to the enumerable's size.
If we're right, we only have to set capacity once, and if we're wrong,
we don't lose anything.

The attached file (or linked PR) adjusts enum.c's to_a method to take
advantage of the size method when it's there. In my tests this makes
Range#to_a about 10% faster, and doesn't have any significant effect on
a vanilla enum with no size method. I couldn't find any existing
benchmark that this consistently made better or worse.

If you like this idea, this could also be done in other classes with
custom to_a, like Hash.
This topic is locked and can not be replied to.