Forum: Ruby-core IO operation is 10x slower in multi-thread environment

18813f71506ebad74179bf8c5a136696?d=identicon&s=25 unknown (Guest)
on 2014-07-08 22:38
(Received via mailing list)
Issue #10009 has been updated by Eric Wong.

File test_thread_sched_pipe.rb added
Description updated

eventfd doesn't help performance (but still reduces FD count),
I never expected eventfd to improve speed, though.

Lowering TIME_QUANTUM_USEC (in thread_pthread.c) helps with the I/O case
(try it yourself if you have a 1000HZ kernel); but hurts overall

Attached is a I/O bench using pipes without Postgres requirement.
Increasing GVL (or any lock) performance is tricky because we need to
balance fairness and avoid starvation cases.  The GVL was rewritten to
avoid starvation in 1.9.3, so that's likely the cause of the major
difference starting with 1.9.3.

I doubt I can noticeably improve performance with futexes vs

How much does GVL performance between 1.9.2 and 2.1 affect real-world
performance on Rainbows!/yahns apps for you?  (not "hello world"-type

I hope to make GVL optional in a few years, but that is tricky.
Ironically, part of the reason I don't like GVL is I don't want to pay
any threading/locking costs for tiny single-threaded apps, either :)

Bug #10009: IO operation is 10x slower in multi-thread environment

* Author: Alexandre Riveira
* Status: Open
* Priority: Urgent
* Assignee:
* Category:
* Target version:
* ruby -v: ruby 2.1 x ruby 1.9.2 with taskset
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
I created this issue #9832 but not have io operation.
In the script attached I simulate IO operation in multi-thread
For ruby 1.9.2 apply `taskset -c -p 2 #{}` for regulates
threads behavior.
The second Thread is a io operation

My results:

1) ruby 2.1.2
first 43500194
second 95
third 42184385

2) ruby-2.0.0-p451
first 38418401
second 95
third 37444470

3) 1.9.3-p545
first 121260313
second 50
third 44275164

4) 1.9.2-p320
first 31189901
second 897 <============
third 31190598


Alexandre Riveira

teste_thread_schedule_2.rb (1.05 KB) (953 Bytes)
teste_thread_schedule.rb (955 Bytes)
test_thread_sched_pipe.rb (1.01 KB)
18813f71506ebad74179bf8c5a136696?d=identicon&s=25 Eric Wong (Guest)
on 2014-08-16 10:35
(Received via mailing list) wrote:
> I doubt I can noticeably improve performance with futexes vs mutex/condvar.

Totally not-speed-optimized futex-based lock/condvar implementation at

  git:// (futex branch)

I am not sure if my implementation is correct, but "make check" passes
with both 8 cores and 1 core active (8-core Vishera).  I will probably
write an independent (C-only) test for more parallelism and maybe steal
some from glibc (I also plan on using this futex-based lock
implementation outside of Ruby).

Benchmarks don't seem to show much (if any) improvement, yet.  Speed
improvement from reimplementing GVL around bare futex interface may be
possible (w/o using separate condvar/mutex layer).

On amd64 GNU/Linux, pthread_mutex_t is 40 bytes, but these futex-based
locks only need 4 bytes.  Similarly, pthread_cond_t is 48 bytes, making
rb_nativethread_cond_t 56 bytes with pthreads; this futex implementation
currently requires only 16 bytes for a condvar.

Size improvement may be noticeable for some apps with many Mutexes:
the lock/cond reductions mean rb_mutex_struct is now 48 bytes instead
of 128 bytes.
18813f71506ebad74179bf8c5a136696?d=identicon&s=25 Eric Wong (Guest)
on 2014-08-18 01:02
(Received via mailing list)
Some tests adapted from glibc:

  git clone git://

tst-cond18-f/p are micro benchmarks, -f (futex version) is roughly
twice a fast as the -p (pthreads version); but that doesn't seem
to translate to noticeable real-world speed improvements in Ruby.
This topic is locked and can not be replied to.