Multiplexed I/O


#1

I’d like to use Ruby for a quite high performance networking tool.
In practice, writing a multithreaded C/C++ program is not pheasible
due to the large connection pool I plan to manage, which would impose
a very scarce stack size. From what I can see, Ruby’s default I/O
semantics are synchronous. One can opt for using IO#select though.
Let’s just say synchronous is OK, API-wise. My question is: if I use
one (Ruby) thread per client, would it block the whole Ruby
interpreter while performing blocking I/O ? On my system (FreeBSD)
ruby is linked against libpthread, for a good reason I guess. Could it
be that it splits blocking routines to a separate (POSIX) thread ? If
not, I’ll probably end up replacing the select() implementation with
kqueue() calls in the Ruby code.


If it’s there, and you can see it, it’s real.
If it’s not there, and you can see it, it’s virtual.
If it’s there, and you can’t see it, it’s transparent.
If it’s not there, and you can’t see it, you erased it.


#2

Robert K. wrote:

2006/4/29, Vlad GALU removed_email_address@domain.invalid:

I'd like to use Ruby for a quite high performance networking tool.

In practice, writing a multithreaded C/C++ program is not pheasible
due to the large connection pool I plan to manage, which would impose
a very scarce stack size. From what I can see, Ruby’s default I/O
semantics are synchronous. One can opt for using IO#select though.
Let’s just say synchronous is OK, API-wise. My question is: if I use
one (Ruby) thread per client, would it block the whole Ruby
interpreter while performing blocking I/O ?

No. You can have a multi threaded application that uses blocking IO
(interface wise) concurrently.

I can’t understand this. If ruby uses non native multithreading, then
all threads will run in the same process. And that implies that one
blocking I/O operation stop the all threads in the ruby interpreter
process, doesn’t it?

Sincerely
Minkoo SEo


#3

2006/4/29, Vlad GALU removed_email_address@domain.invalid:

I'd like to use Ruby for a quite high performance networking tool.

In practice, writing a multithreaded C/C++ program is not pheasible
due to the large connection pool I plan to manage, which would impose
a very scarce stack size. From what I can see, Ruby’s default I/O
semantics are synchronous. One can opt for using IO#select though.
Let’s just say synchronous is OK, API-wise. My question is: if I use
one (Ruby) thread per client, would it block the whole Ruby
interpreter while performing blocking I/O ?

No. You can have a multi threaded application that uses blocking IO
(interface wise) concurrently. Bute even though Ruby’s threads are
non native there is a certain overhead associated with them. Using
select together with a thread pool might be an option, too. This
approach usually scales better than individual threads per IO.

On my system (FreeBSD)
ruby is linked against libpthread, for a good reason I guess. Could it
be that it splits blocking routines to a separate (POSIX) thread ?

Not as far as I know.

If
not, I’ll probably end up replacing the select() implementation with
kqueue() calls in the Ruby code.

If I was going to write a high performance application that had to
deal with a lot concurrent IO channels I’d choose Java’s NIO. You get
select like behavior plus fairly easy MT handling - if you need that -
and also an easier programming model than C/C++ which is nevertheless
as performant.

Kind regards

robert


#4

On Apr 29, 2006, at 4:39 AM, Vlad GALU wrote:

not, I’ll probably end up replacing the select() implementation with
kqueue() calls in the Ruby code.


If it’s there, and you can see it, it’s real.
If it’s not there, and you can see it, it’s virtual.
If it’s there, and you can’t see it, it’s transparent.
If it’s not there, and you can’t see it, you erased it.

I’d just like to point at another option, IO::Reactor
http://www.deveiate.org/projects/IO-Reactor/


#5

2006/5/1, Minkoo S. removed_email_address@domain.invalid:

Robert K. wrote:

No. You can have a multi threaded application that uses blocking IO
(interface wise) concurrently.

I can’t understand this. If ruby uses non native multithreading, then
all threads will run in the same process. And that implies that one
blocking I/O operation stop the all threads in the ruby interpreter
process, doesn’t it?

This is true only if the interpreter itself uses blocking IO - which
is not the case. Hence the term “interface wise” in my quote above.

robert


#6

On May 1, 2006, at 1:44 AM, Minkoo S. wrote:

I can’t understand this. If ruby uses non native multithreading, then
all threads will run in the same process. And that implies that one
blocking I/O operation stop the all threads in the ruby interpreter
process, doesn’t it?

I’m pretty sure Ruby selects over sockets under the hood, to keep the
interpreter from blocking. I don’t believe that’s a perfect solution
those as something like a large write could still block the process.

James Edward G. II


#7

Minkoo S. wrote:

Robert K. wrote:

2006/4/29, Vlad GALU removed_email_address@domain.invalid:

I'd like to use Ruby for a quite high performance networking tool.

In practice, writing a multithreaded C/C++ program is not pheasible
due to the large connection pool I plan to manage, which would impose
a very scarce stack size. From what I can see, Ruby’s default I/O
semantics are synchronous. One can opt for using IO#select though.
Let’s just say synchronous is OK, API-wise. My question is: if I use
one (Ruby) thread per client, would it block the whole Ruby
interpreter while performing blocking I/O ?

No. You can have a multi threaded application that uses blocking IO
(interface wise) concurrently.

I can’t understand this. If ruby uses non native multithreading, then
all threads will run in the same process. And that implies that one
blocking I/O operation stop the all threads in the ruby interpreter
process, doesn’t it?

This is indeed an interesting question, one that hasn’t been cleared
enough in the Ruby docs, I feel. It seems that there’s also a difference
between Windows and Unixes here. If someone could elaborate, it will be
great.


#8

2006/5/1, James Edward G. II removed_email_address@domain.invalid:

On May 1, 2006, at 1:44 AM, Minkoo S. wrote:

I can’t understand this. If ruby uses non native multithreading, then
all threads will run in the same process. And that implies that one
blocking I/O operation stop the all threads in the ruby interpreter
process, doesn’t it?

I’m pretty sure Ruby selects over sockets under the hood, to keep the
interpreter from blocking. I don’t believe that’s a perfect solution
those as something like a large write could still block the process.

Theoretically yes, but in practice this works pretty well. Could even
be that large writes are split up by the Ruby engine.

robert


#9

On May 1, 2006, at 10:27 AM, Robert K. wrote:

interpreter from blocking. I don’t believe that’s a perfect solution
those as something like a large write could still block the process.

Theoretically yes, but in practice this works pretty well. Could even
be that large writes are split up by the Ruby engine.

I can remember at least one post from a person who ran into this
exact issue.

Bill K. also just followed up with a post about the effort being
made to tie in nonblocking operations to improve this issue. I
believe that’s the recommended Ruby socket handling method. I know
I’ve seen it suggested here more than once.

To me, these are signs that a simple select call doesn’t work as well
as we would like it to.

James Edward G. II


#10

If you look at the implementation of Ruby’s select, you’ll see that it
interacts with the Ruby-thread scheduler so as to solve the problem
you’re
talking about.
What’s more interesting is that true native threads don’t play nice in
the
sandbox with Ruby threads- they don’t share synchronization primitives,
and
even worse, you can’t call select(2) in a native thread because the Ruby
scheduler uses it. If you do that, all Ruby threads will block.

We wrote a native-code networking engine called eventmachine (available
as a
Gem for Linux, Windows port coming very soon), that had to deal with
these
problems. Give it a look and see if it can help you.


#11

interpreter from blocking. I don’t believe that’s a perfect solution
those as something like a large write could still block the process.

Theoretically yes, but in practice this works pretty well. Could even
be that large writes are split up by the Ruby engine.

Also, from looking at differences in the Ruby socket code
between 1.8.2 and 1.8.4, it looks like a significant effort
was made to handle sockets in nonblocking mode.

A grep for EWOULDBLOCK in 1.8.4:

ext/socket/socket.c:#ifndef EWOULDBLOCK
ext/socket/socket.c:#define EWOULDBLOCK EAGAIN
ext/socket/socket.c: case EWOULDBLOCK:
ext/socket/socket.c: * * Errno::EWOULDBLOCK - the socket is marked as
nonblocking and the
ext/socket/socket.c: * * Errno::EWOULDBLOCK - see Errno::EAGAIN
ext/socket/socket.c: * * Errno::EWOULDBLOCK - +socket+ is marked as
nonblocking and a call to
ext/socket/socket.c: * * Errno::EWOULDBLOCK - same as Errno::EAGAIN
ext/socket/socket.c: * * Errno::EWOULDBLOCK - +socket+ is marked as
nonblocking and no connections are
file.c:#if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
file.c: case EWOULDBLOCK:
file.c:#if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
file.c: case EWOULDBLOCK:
io.c:#if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
io.c: case EWOULDBLOCK:
io.c:#if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
io.c: case EWOULDBLOCK:
io.c:#if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
io.c: case EWOULDBLOCK:
win32/win32.c:#define LK_ERR(f,i) ((f) ? (i = 0) : (errno =
GetLastError() == ERROR_LOCK_VIOLATION ? EWOULDBLOCK : EACCES))
win32/win32.c: } while (i && errno == EWOULDBLOCK);
win32/win32.c: if (r != WSAEWOULDBLOCK) {

That said, I haven’t actually put it to the test. But looking
at the code, it seems nonblocking sockets are being handled
these days. So perhaps not even big writes necessarily block
anymore, if we put the socket in nonblocking mode.

(Anyone know for sure?)

Regards,

Bill


#12

2006/4/29, Vlad GALU removed_email_address@domain.invalid:

On 4/29/06, Robert K. removed_email_address@domain.invalid wrote:

non native there is a certain overhead associated with them. Using
select together with a thread pool might be an option, too. This
approach usually scales better than individual threads per IO.

Basically polling the socket within each thread, right ? I did this
before, in C, and indeed, it worked quite OK.

I though of a different architecture: have one thread select’ing and
hand off IO tasks to a set of threads via a queue.

I could’ve as well written in what I already know best - C++ :slight_smile: My
goal is to use Ruby :slight_smile:

Ah, ok. Then of course, go ahead - and have fun! :slight_smile:

Thanks for the hints, Robert. Should be enough to get me going.

My pleasure.

Kind regards

robert


#13

On 4/29/06, Robert K. removed_email_address@domain.invalid wrote:

No. You can have a multi threaded application that uses blocking IO
(interface wise) concurrently. Bute even though Ruby’s threads are
non native there is a certain overhead associated with them. Using
select together with a thread pool might be an option, too. This
approach usually scales better than individual threads per IO.

Basically polling the socket within each thread, right ? I did this
before, in C, and indeed, it worked quite OK.

If I was going to write a high performance application that had to
deal with a lot concurrent IO channels I’d choose Java’s NIO. You get
select like behavior plus fairly easy MT handling - if you need that -
and also an easier programming model than C/C++ which is nevertheless
as performant.

I could’ve as well written in what I already know best - C++ :slight_smile: My
goal is to use Ruby :slight_smile:
Thanks for the hints, Robert. Should be enough to get me going.

Kind regards

robert


Have a look: http://www.flickr.com/photos/fussel-foto/

Nice kittens :wink:


If it’s there, and you can see it, it’s real.
If it’s not there, and you can see it, it’s virtual.
If it’s there, and you can’t see it, it’s transparent.
If it’s not there, and you can’t see it, you erased it.


#14

A few people have suggested that we compile answers to (frequent)
questions. This would be useful as a refernce to point people to.

In this case, I’m not answering an especially frequent question, but
my answer would serve as a starting point for someone playing with
sockets, I think. Could someone please suggest a good Wiki location
for me to place the code below?

I hope you’re having a good weekend! It’s a bank holiday here in
England, so it’s a wonderful three days of not being at work :wink:
happy sigh

All the best,
Benjohn


#15

On Apr 29, 2006, at 1:40 pm, Benjohn B. wrote:

I hope you’re having a good weekend! It’s a bank holiday here in
England, so it’s a wonderful three days of not being at work :wink:
happy sigh

I only came in to check my e-mail because I was reading in the garden
and the sun was dazzling my eyes. I can stand three days of this :slight_smile:


#16

On 29 Apr 2006, at 09:39, Vlad GALU wrote:

not, I’ll probably end up replacing the select() implementation with
kqueue() calls in the Ruby code.

I don’t understand why you can’t use c++ here, so I guess I’m missing
something - my reason for not using c++ would be that I no longer
enjoy programming with it :slight_smile:

However…

I wrote a server recently that “talked” on numerous sockets to many
clients. It would have seemed multi threaded to them, but was in fact
single threaded. Here’s a very simple version of it. It doesn’t
handle disconnections at all, but they are easy to add in. I was
quite pleased that I could extend the socket objects to be able to
polymorphically handle an event on them. I think it’ll be pretty high
performance, but I’ve no idea if that’s true. It certainly doesn’t
need to poll, and it doesn’t have to jump about in different thread
contexts.

All the best,
Benj

require ‘socket’

$sockets = []

def process_sockets
select($sockets)[0].each do |soc|
soc.handle_event
end
end

module Server
def self.create_server(port)
server = TCPServer.new(port)
server.extend( self )
end

def handle_event
	$sockets << accept_connection
end

def accept_connection
	con = accept
	con.extend(Connection)
end

end

module Connection
def handle_event
s = gets.chomp
Kernel.puts “#{self} said #{s}”
puts “You said #{s}”
end
end

$sockets << Server.create_server(5001)
loop {process_sockets}


#17

Quoting removed_email_address@domain.invalid, on Sat, Apr 29, 2006 at 05:39:22PM +0900:

I’d like to use Ruby for a quite high performance networking tool.
In practice, writing a multithreaded C/C++ program is not pheasible
due to the large connection pool I plan to manage, which would impose
a very scarce stack size. From what I can see, Ruby’s default I/O
semantics are synchronous. One can opt for using IO#select though.

Ruby threads are basically wrappers for select() aka. multiplexed i/o.
Every thread when doing something in ruby that would result in a
(potentially) blocking system call on an fd actually calls select
instead of the syscall. So, it will look to you like you have multiple
threads independently performing "blocking’ i/o operations, but from a
unix/system call perspective you have one process, waiting on select
when all ruby “threads” are waiting for i/o.

This is how you would write a high-performance C/C++ network tool
without using OS threads, too.

Let’s just say synchronous is OK, API-wise. My question is: if I use
one (Ruby) thread per client, would it block the whole Ruby
interpreter while performing blocking I/O ? On my system (FreeBSD)
ruby is linked against libpthread, for a good reason I guess.

Yes, but the reason is it gets threadsafe versions of C libraries, not
so it can make pthreads for each ruby Thread.

Could it be that it splits blocking routines to a separate (POSIX)
thread ?

No, it uses multiplexed i/o so ruby threads aren’t blocked on i/o
in other ruby threads.

If not, I’ll probably end up replacing the select() implementation
with kqueue() calls in the Ruby code.

You know best, but I don’t see how this will help. If you have
so many fds that you hit the scalability limits of select() and
need kqueue, you might be in trouble with ruby unless you can
access fairly raw fds. Maybe if you do sys_read/sys_write you bypass
ruby’s internal select()? Worth checking.

Also, kqueue() itself returns an fd and kevent() is blocking… A ruby
binding for kevent() would use ruby’s C APIs to select() on the fd, and
only call kevent() when it new it wouldn’t block.

You might be able to find a kqueue()/kevent() extension if you googled
for one.

Good luck, the project sounds fun.

Cheers,
Sam


#18

On 29 Apr 2006, at 13:47, Ashley M. wrote:

On Apr 29, 2006, at 1:40 pm, Benjohn B. wrote:

I hope you’re having a good weekend! It’s a bank holiday here in
England, so it’s a wonderful three days of not being at work :wink:
happy sigh

I only came in to check my e-mail because I was reading in the
garden and the sun was dazzling my eyes. I can stand three days of
this :slight_smile:

:slight_smile: I’ve just installed wireless last night. How sweet it is!


#19

Quoting removed_email_address@domain.invalid, on Sun, Apr 30, 2006 at 01:24:54AM +0900:

Bottom line, it would be nice if Ruby used whatever the platform
it’s running on has (kqueue, epoll, etc) instead of plain select(),
moving the abstraction a bit lower. This should be enough for most
challenges, I think.

Probably nobody has ever run into a select scalability problem. select
has worked well for many years, its just servers with REALLY large
numbers of concurrent connections that have problems, to my knowledge.

Look at the ruby src, eval.c:rb_thread_schedule. If you could write a
ruby program that hit the select scalability limit before hitting some
other kind (like that ruby is pretty slow compared to C), you could
rewrite this to use other mechanisms.

I suspect that a patch to do so would be accepted for 1.9 if you could
demo ruby breaking because it couldn’t handle the number of
threads/sockets you had open, unless kevent()/epoll() is actually worse
than select() in the common case of a few dozen fds. Hopefully that is
not the case.

Cheers,
Sam


#20

On 4/29/06, Sam R. removed_email_address@domain.invalid wrote:

instead of the syscall. So, it will look to you like you have multiple
threads independently performing "blocking’ i/o operations, but from a
unix/system call perspective you have one process, waiting on select
when all ruby “threads” are waiting for i/o.

Thank you! It’s all clear now!

Yes, but the reason is it gets threadsafe versions of C libraries, not

You know best, but I don’t see how this will help. If you have
so many fds that you hit the scalability limits of select() and
need kqueue, you might be in trouble with ruby unless you can
access fairly raw fds. Maybe if you do sys_read/sys_write you bypass
ruby’s internal select()? Worth checking.

Also, kqueue() itself returns an fd and kevent() is blocking… A ruby
binding for kevent() would use ruby’s C APIs to select() on the fd, and
only call kevent() when it new it wouldn’t block.

It depends on how you call kevent(), you can emulate select()'s
behaviour when passing a zeroed timespec struct. It imediately
returns, and then it’s up to you to see whether you have pending
events or not.

You might be able to find a kqueue()/kevent() extension if you googled
for one.

I have, I took a look at Ruby/Event and Myriad before posting :slight_smile:

Bottom line, it would be nice if Ruby used whatever the platform
it’s running on has (kqueue, epoll, etc) instead of plain select(),
moving the abstraction a bit lower. This should be enough for most
challenges, I think.

Thank you too for your hints.