Timsort in Ruby

I just learned about Tim sort for the first time:
Timsort - Wikipedia. Is there some specific reason
why timsort isn’t implemented in Ruby, or is it just not a priority of
the Ruby community at the moment, and it may be addressed later?

On Wed, 12 Jun 2013 23:38:17 +0200, Alphonse 23 [email protected]
wrote:

I just learned about Tim sort for the first time:
Timsort - Wikipedia. Is there some specific reason
why timsort isn’t implemented in Ruby, or is it just not a priority of
the Ruby community at the moment, and it may be addressed later?

Probably simply because nobody ever did that. If you implemented it and
submitted a patch (with benchmarks proving that it is in fact an
improvement), it would probably be happily accepted. (But do note, the
sorting funciton Ruby uses seems already heavily magically optimized:
http://rxr.whitequark.org/mri/source/util.c#286 )

So that’s Ruby’s sorting function… It doesn’t even look like it’s
checking the size of the array before sorting. Does this mean, quicksort
is used no matter what the size of the array. So even when I’m sorting
10 elements, it’s using quicksort?

Ruby’s quicksort function is 207 lines. Timsort is 1301 lines of C.
http://code.google.com/p/timsort/source/browse/trunk/timsort.c

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

Alphonse 23 wrote in post #1112359:

So that’s Ruby’s sorting function… It doesn’t even look like it’s
checking the size of the array before sorting. Does this mean, quicksort
is used no matter what the size of the array. So even when I’m sorting
10 elements, it’s using quicksort?

Ruby’s quicksort function is 207 lines. Timsort is 1301 lines of C.
Google Code Archive - Long-term storage for Google Code Project Hosting.

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

So you’re saying More SLOC = More Optimised? That’s quite an arbitrary
benchmark.

If the performance of Ruby core’s sorting is shameful to you, write a
gem. And I really do mean that, the option is available, and it’s a
good way to eventually promote functionality into core. I’d be
interested myself, if timsort didn’t apparently require me to write and
maintain 1300 lines of code.

On 14/06/2013, at 9:56 AM, Alphonse 23 [email protected] wrote:

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

For sure, because the main criteria I use for choosing a language is how
optimised the sorting algorithms are. I’m totally ashamed to know that I
have spent the last 10 years of my life using a language with a
sub-optimal sort. Thanks for bringing this to my attention.

Henry

On Thu, Jun 13, 2013 at 7:23 AM, Bartosz Dziewoński
[email protected]wrote:
On Wed, 12 Jun 2013 23:38:17 +0200, Alphonse 23 [email protected]
wrote:

I just learned about Tim sort for the first time:

Timsort - Wikipedia. Is there some specific reason
why timsort isn’t implemented in Ruby, or is it just not a priority of
the Ruby community at the moment, and it may be addressed later?

Probably simply because nobody ever did that. If you implemented it and
submitted a patch (with benchmarks proving that it is in fact an
improvement), it would probably be happily accepted. (But do note, the
sorting funciton Ruby uses seems already heavily magically optimized:
http://rxr.whitequark.org/mri/source/util.c#286 )


Matma R.

The Ruby Cross Reference http://rxr.whitequark.org/
What a nice site. Easy to follow code view.
http://rxr.whitequark.org/

On Thu, Jun 13, 2013 at 4:56 PM, Alphonse 23 [email protected]
wrote:

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

Shut up and hack lamer!

Alphonse 23 wrote in post #1112359:

Ruby’s quicksort function is 207 lines. Timsort is 1301 lines of C.
Google Code Archive - Long-term storage for Google Code Project Hosting.

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

Out of interested, I yanked that C source file from Python, modified
some parameters to fit Ruby’s comparison method signature, and built it
into a current ruby-2.1-dev [1].

I then benchmarked it against a bunch of arrays of integers[2], using a
simple for i in 1..n; array.dup.sort; end loop.

Note that I tried to MIN_MERGE values, reflecting both the Python and
Java implementations. These are typical results:

                  ruby_qsort   timsort[32]  timsort[64]

empty 0.003296 0.003462 0.004511
10, in order 0.008184 0.019551 0.019676
10, reversed 0.010642 0.021260 0.022928
10, random 0.007978 0.024539 0.025238
100, in order 0.018367 0.031840 0.035474
100, reversed 0.018310 0.041987 0.040050
100, random 0.049978 0.141741 0.135564
1,000, in order 0.100423 0.098470 0.101137
1,000, reversed 0.100676 0.130736 0.132880
1,000, random 0.873151 1.997130 2.049663
10,000, in order 0.910810 0.810930 0.860948
10,000, reversed 0.913941 1.109160 1.182836
10,000, random 12.381776 24.416879 24.909807

I know there’s object allocation and GC overhead, but it should be the
same for both ruby builds, as they only differ in whether array.c
[line 2313] calls ruby_qsort() or timsort().

[1] Comparing ruby:master...phluid61:timsort · ruby/ruby · GitHub
[2] https://gist.github.com/phluid61/5779737

Matthew K. wrote in post #1112422:

Alphonse 23 wrote in post #1112359:

Ruby’s quicksort function is 207 lines. Timsort is 1301 lines of C.
Google Code Archive - Long-term storage for Google Code Project Hosting.

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

Out of interested, I yanked that C source file from Python, modified
some parameters to fit Ruby’s comparison method signature, and built it
into a current ruby-2.1-dev [1].

I then benchmarked it against a bunch of arrays of integers[2], using a
simple for i in 1..n; array.dup.sort; end loop.

Note that I tried to MIN_MERGE values, reflecting both the Python and
Java implementations. These are typical results:

                  ruby_qsort   timsort[32]  timsort[64]

empty 0.003296 0.003462 0.004511
10, in order 0.008184 0.019551 0.019676
10, reversed 0.010642 0.021260 0.022928
10, random 0.007978 0.024539 0.025238
100, in order 0.018367 0.031840 0.035474
100, reversed 0.018310 0.041987 0.040050
100, random 0.049978 0.141741 0.135564
1,000, in order 0.100423 0.098470 0.101137
1,000, reversed 0.100676 0.130736 0.132880
1,000, random 0.873151 1.997130 2.049663
10,000, in order 0.910810 0.810930 0.860948
10,000, reversed 0.913941 1.109160 1.182836
10,000, random 12.381776 24.416879 24.909807

(as a note, it would be interesting to benchmark both running time,
which you have, and “number of compares requires” also).

I wonder if the timsort project itself has some benchmark test cases you
could use. I know it’s supposed to work better with semi-sorted data,
maybe try that. Of course, with surprising results like these, it would
be interesting to port ruby’s sort into the python trunk/core and see if
it makes Python faster there (and if so, to ask the Python and timsort
people …):slight_smile:
-roger-

I’m impressed you got that working. Interesting that the only benchmark
timsort beat was 10,000 in order…

The link I posted to timsort isn’t the python implementation. I found
that from a stupid google search. Here is the real code as is in the
python source:
http://hg.python.org/cpython/file/default/Objects/listobject.c

So how did you do that?

ruby_qsort(RARRAY_PTR(tmp), len, sizeof(VALUE),
ruby_timsort(RARRAY_PTR(tmp), len, sizeof(VALUE)

Looks like you stuck a pointer into an array and the length (looks like
twice). That’s all you had to do to get the python implementation
working in ruby?

from the ruby source, the quick sort function takes in 5 arguments.
ruby_qsort(void* base, const size_t nel, const size_t size, cmpfunc_t
*cmp, void *d)

Roger P. wrote in post #1112786:

Matthew K. wrote in post #1112422:

Alphonse 23 wrote in post #1112359:

Ruby’s quicksort function is 207 lines. Timsort is 1301 lines of C.
Google Code Archive - Long-term storage for Google Code Project Hosting.

Shouldn’t the Ruby community feel ashamed that Python has a more
optimized sorting function?

Out of interested, I yanked that C source file from Python, modified
some parameters to fit Ruby’s comparison method signature, and built it
into a current ruby-2.1-dev [1].

I then benchmarked it against a bunch of arrays of integers[2], using a
simple for i in 1..n; array.dup.sort; end loop.

Note that I tried to MIN_MERGE values, reflecting both the Python and
Java implementations. These are typical results:

                  ruby_qsort   timsort[32]  timsort[64]

empty 0.003296 0.003462 0.004511
10, in order 0.008184 0.019551 0.019676
10, reversed 0.010642 0.021260 0.022928
10, random 0.007978 0.024539 0.025238
100, in order 0.018367 0.031840 0.035474
100, reversed 0.018310 0.041987 0.040050
100, random 0.049978 0.141741 0.135564
1,000, in order 0.100423 0.098470 0.101137
1,000, reversed 0.100676 0.130736 0.132880
1,000, random 0.873151 1.997130 2.049663
10,000, in order 0.910810 0.810930 0.860948
10,000, reversed 0.913941 1.109160 1.182836
10,000, random 12.381776 24.416879 24.909807

(as a note, it would be interesting to benchmark both running time,
which you have, and “number of compares requires” also).

I wonder if the timsort project itself has some benchmark test cases you
could use. I know it’s supposed to work better with semi-sorted data,
maybe try that. Of course, with surprising results like these, it would
be interesting to port ruby’s sort into the python trunk/core and see if
it makes Python faster there (and if so, to ask the Python and timsort
people …):slight_smile:
-roger-

“in order” and “reversed” already handled in Ruby’s quick sort
implementation. You should really try “in order + one misordered” and
“reversed + one misordered”.

And hey: what is a strange and very inefficient COPY macros? Ruby’s
quicksort is really optimized at low level for particular usecase, and
this timsort looks to be unoptimized.