Networking an multiple queues

musicdenotation · November 22, 2013, 8:54pm

Hi.
I am writing a client application, where I face consumer-producer
problem.
Basically, the idea is that I have connection limit - because of that I
need to limit amount of networking threads. I create multiple
connections to the server and store them, as objects, in the queue. I
don’t want to reconnect every time - this would limit data transfer
rate. But the data which I want to send to the server isn’t available
from the beginning. Because of that I implement the second queue, where
I store this data.
So I have these two queues and I need to process the data in some
reasonable manner. For this, I create thread pool(thread limit =
connection limit) based on following class:
https://github.com/Burgestrand/burgestrand.github.com/blob/master/code/ruby-thread-pool/thread-pool.rb
And every scheduled job pops objects from the queues: data and
connection. After data was transferred, connection is pushed to the
queue where it came from and the next scheduled job, which was waiting,
pops it and reuses.
This model is working, but not ideally. Data transfer rate is lower than
it could be and speed is irregular. Network monitor shows diagram which
looks like this:
/‾‾‾‾‾/‾‾‾‾‾‾/‾‾‾‾‾‾‾/‾‾‾‾‾‾
First I thought that I am encountering this problem, because I data
isn’t prepared as fast as it should, but this isn’t the case. Now I
think that I see this behaviour, because pushing/popping to queue isn’t
as good as I thought it would be, but I don’t have any other ideas how I
could solve this particular problem.
I would be thankful for any advice.

Tad_D · November 22, 2013, 10:47pm

Tad D. [email protected] wrote:

And every scheduled job pops objects from the queues: data and
connection. After data was transferred, connection is pushed to the
queue where it came from and the next scheduled job, which was
waiting, pops it and reuses.

How long is each connection waiting for? Are you sure it’s reused and
not transparently reconnecting? (“strace -f -e connect” can confirm on
Linux)

TCP slow start may kick in if a connection is idle for even 1s or so.

This model is working, but not ideally. Data transfer rate is lower
than it could be and speed is irregular. Network monitor shows diagram
which looks like this:

/‾‾‾‾‾/‾‾‾‾‾‾/‾‾‾‾‾‾‾/‾‾‾‾‾‾\

If this is Linux, try disabling the net.ipv4.tcp_slow_start_after_idle
sysctl (this affects IPv6, too, despite the name)

echo 0 | sudo tee /proc/sys/net/ipv4/tcp_slow_start_after_idle

(edit /etc/sysctl.{conf,d/*} to make it permanent across reboots)
There may be similar knobs in other OSes.

Of course, the server on the other end may also be having this
problem…

Tad_D · November 22, 2013, 11:57pm

On 22 Nov 2013, at 22:46, Eric W. [email protected] wrote:

How long is each connection waiting for? Are you sure it’s reused and
not transparently reconnecting? (“strace -f -e connect” can confirm on
Linux)
Connection has 60s timeout value.It should be more than enough. strace
shows only that threads are exiting:
[pid 11201] — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=11614, si_status=0, si_utime=0, si_stime=0} —
[pid 11615] +++ exited with 0 +++
Process 11620 attached
Process 11621 attached
[pid 11620] +++ exited with 0 +++
[pid 11201] — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=11620, si_status=0, si_utime=0, si_stime=0} —
[pid 11205] +++ exited with 0 +++
[pid 11621] +++ exited with 0 +++
Process 11622 attached
Process 11623 attached
[pid 11622] +++ exited with 0 +++
[pid 11201] — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=11622, si_status=0, si_utime=0, si_stime=0} —
[pid 11211] +++ exited with 0 +++
[pid 11209] +++ exited with 0 +++
Process 11627 attached
Process 11628 attached
[pid 11623] +++ exited with 0 +++
[pid 11627] +++ exited with 0 +++
[pid 11201] — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=11627, si_status=0, si_utime=0, si_stime=1} —
[pid 11203] +++ exited with 0 +++
[pid 11628] +++ exited with 0 +++

echo 0 | sudo tee /proc/sys/net/ipv4/tcp_slow_start_after_idle

(edit /etc/sysctl.{conf,d/*} to make it permanent across reboots)
There may be similar knobs in other OSes.

Of course, the server on the other end may also be having this
problem…

I doubt that the server has this problem(however I will test tcp slow
start tomorrow and I will post my findings), because I can check it with
clients written by others(for example, with python implementation) and
everything is fine. If there is a problem - it’s in my code and I
thought that I am missing something about how the Queues are supposed to
work. If I do the same without connection queue and just start some
threads, which start connections and begin processing data in a
loop(still accessing data to transfer via Queue) - transfer rate is
stable, although I not satisfied with this solution, because I find it
not as flexible.

Tad_D · November 23, 2013, 4:41am

Tad D. [email protected] wrote:

implementation) and everything is fine. If there is a problem - it’s
in my code and I thought that I am missing something about how the

Can you share the rest of (or ideally all of) your code?

Tad_D · November 23, 2013, 9:36am

On 23 Nov 2013, at 04:41, Eric W. [email protected] wrote:

Tad D. [email protected] wrote:

implementation) and everything is fine. If there is a problem - it’s
in my code and I thought that I am missing something about how the

Can you share the rest of (or ideally all of) your code?

The code in question can be seen on github:
https://github.com/tdobrovolskij/sanguinews/blob/master/sanguinews.rb

I started this project about the month ago with one goal in my mind: to
learn ruby. So at this point of time, if something is wrong - I blame
myself, not a system. Line 236 and downwards - this is where
multithreaded interaction is starting.

Tad_D · November 23, 2013, 2:07pm

I did test the performance after disabling slow start and I got the same
results. Data transfer rate isn’t stable. So, I suppose, it proves that
the problem is in the code.

Tad_D · November 24, 2013, 12:38pm

On Sat, Nov 23, 2013 at 9:35 AM, Tad D. [email protected]
wrote:

On 23 Nov 2013, at 04:41, Eric W. [email protected] wrote:

Can you share the rest of (or ideally all of) your code?

The code in question can be seen on github:
https://github.com/tdobrovolskij/sanguinews/blob/master/sanguinews.rb

I started this project about the month ago with one goal in my mind: to learn
ruby. So at this point of time, if something is wrong - I blame myself, not a
system. Line 236 and downwards - this is where multithreaded interaction is
starting.

I did not find a single class definition in that code. For my feeble
eyes there is too few structure there. For example, the iteration
files.each starting at line 259 goes on until line 355 - that’s almost
100 lines of code for a single loop body! That makes it hard to
understand the logic. You are also mixing instance variables and
local variables even though on top level they share the same scope
(“password” is a local variable, “@username” is an instance variable);
I cannot see a pattern for this.

Another thing: you seem to always change variables dirmode and
filemode together. That means they are redundant. This is error prone
and confusing. There should be just one flag. I am not sure how
different processing should be for filemode and dirmode but maybe you
want to create two different classes to handle each - maybe with a
common base class which contains the common logic (-> template method
pattern).

It’s not clear to me why you use a second thread to fill the
connection pool. Btw. you should use a begin ensure end block to
handle connections from the pool in order to ensure you do not create
a leak and loose connections in case of exceptions. Ideally your pool
class does this for you, e.g.

def use
conn = take_from_pool
begin
yield conn
ensure
put_back conn
end
end

Then you can do

pool.use do |conn|
conn.send …
end

and do not need to worry about exceptions. See

http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html

One thing in your original description stood out:

For this, I create thread pool(thread limit = connection limit)

If you have only as many threads for processing as there are
connections then you do not need a connection pool. It’s then much
more efficient to open a connection on thread start, use it as long as
the thread runs and close it when it finishes. That avoids
synchronization overhead that you have when taking a connection from
the pool and returning it.

If preparing the data for transfer is an expensive task (i.e. takes
some time) then the one connection per thread approach might be
inefficient since the connection will sit there idly most of the time
yet use some resources on the server side. In those situations it may
be worthwhile to use a connection pool.

Kind regards

robert

Tad_D · November 24, 2013, 1:21pm

On 24 Nov 2013, at 12:38, Robert K. [email protected]
wrote:

I did not find a single class definition in that code.
I moved all the classes I’ve defined to external files in lib
subdirectory and I am loading them with ‘load’ method. Is this
considered heretical?

For my feeble
eyes there is too few structure there. For example, the iteration
files.each starting at line 259 goes on until line 355 - that’s almost
100 lines of code for a single loop body! That makes it hard to
understand the logic.
Yes, no excuse here. I’ve seen this problem myself. I am working on
moving all tasks to designated methods to solve this problem.

You are also mixing instance variables and
local variables even though on top level they share the same scope
(“password” is a local variable, “@username” is an instance variable);
I cannot see a pattern for this.
I’ve started with instance variables to make my job easier. Now, with
each version, I am slowly getting rid of them. I know that this is bad
programming to rely on them.
Another thing: you seem to always change variables dirmode and
filemode together. That means they are redundant. This is error prone
and confusing. There should be just one flag.
You are right once again. This is so obvious that I don’t know, why I
didn’t think about this myself.
I am not sure how
different processing should be for filemode and dirmode but maybe you
want to create two different classes to handle each - maybe with a
common base class which contains the common logic (-> template method
pattern).
Thank you for your advice. I’ll think about this.
It’s not clear to me why you use a second thread to fill the
connection pool.
I am doing this to have connections ready approximately at the same time
as data is prepared(encoded).
put_back conn

http://blog.rubybestpractices.com/posts/rklemme/002_Writing_Block_Methods.html
Thank you. I’ll think about rewriting pool class to do exactly this.

One thing in your original description stood out:

For this, I create thread pool(thread limit = connection limit)

If you have only as many threads for processing as there are
connections then you do not need a connection pool. It’s then much
more efficient to open a connection on thread start, use it as long as
the thread runs and close it when it finishes. That avoids
synchronization overhead that you have when taking a connection from
the pool and returning it.
I’ve switched to a connection pool, because it was quite difficult to
track the state of threads and their relationship to data(are they
finished already or is there still some data chunks left). So I moved to
single jobs, which could, theoretically, have data and connection as a
parameters. It’s much more easy to operate with them on this level, but
I’ve run into strange behaviour on the network level.
If preparing the data for transfer is an expensive task (i.e. takes
some time) then the one connection per thread approach might be
inefficient since the connection will sit there idly most of the time
yet use some resources on the server side. In those situations it may
be worthwhile to use a connection pool.
Preparing data was really expensive before I rewrote this particular
part of the code in inline C. Now it takes just about 2 seconds per few
hundred megabytes of binary data. Still expensive, but in almost all
scenarios bottleneck will be network, not the data preparation.

Thank you for your input. I will look into improving my code according
to your suggestions.

Tad_D · November 24, 2013, 6:33pm

On 24 Nov 2013, at 17:29, Robert K. [email protected]
wrote:

Only the part where you use “load”. We use require for loading
modules so we avoid loading them multiple times. For relative loading
you can use require_relative.
Oh, I didn’t know about require_relative. It makes sense now. Thank you
once again.
If you move all the other code in classes in separate files I would
probably only leave the main command line processing and top level
logic in the main .rb file.

That’s the ideal I striving for.

For this, I create thread pool(thread limit = connection limit)
what your program is supposed to do. Can you give a birds eye view?
Do you just prepare data and send it off or are you getting something
back as well?
I will explain the whole concept in layman’s terms. Please excuse me, if
some of this wasn’t necessary.
Usenet - internet’s predecessor is like one big forum. You can post, you
can read. Everything is done via NNTP protocol. To post binary data, it
needs to be converted to ASCII first(8-bit). The most common encoding
method is yEnc.
So, my program is supposed to encode the given files and post them,
where they should be and be as efficient at this as it is possible.
Because of efficiency part, I have switched to inline C for encoding.
But now I’ve hit another wall - 5MB/s is the limit I am able to achieve
with my program(and I see these strange drops too). It is puzzle which I
am planning to solve.
Communication between program and server is almost one way. Almost,
because I need to receive server’s responses. Otherwise I won’t know if
posting was successful.
one is using the connection to send.
I am currently using only one thread(although with priority slightly
above normal) to prepare data. I think that this is sufficient. All
other threads are there only to fetch the data and upload them in most
efficient manner. I know that I’ll be remodelling my upload threads, I
just don’t know how.

Tad_D · November 24, 2013, 5:29pm

On Sun, Nov 24, 2013 at 1:20 PM, Tad D. [email protected]
wrote:

On 24 Nov 2013, at 12:38, Robert K. [email protected] wrote:

I did not find a single class definition in that code.
I moved all the classes I’ve defined to external files in lib subdirectory and I
am loading them with ‘load’ method. Is this considered heretical?

Only the part where you use “load”. We use require for loading
modules so we avoid loading them multiple times. For relative loading
you can use require_relative.

If you move all the other code in classes in separate files I would
probably only leave the main command line processing and top level
logic in the main .rb file.

It’s not clear to me why you use a second thread to fill the
connection pool.
I am doing this to have connections ready approximately at the same time as data
is prepared(encoded).

I would have assumed that connecting is much cheaper than preparation
of the data. So you would not gain that much. But it may be
different in your case, of course.

I’ve switched to a connection pool, because it was quite difficult to track the
state of threads and their relationship to data(are they finished already or is
there still some data chunks left). So I moved to single jobs, which could,
theoretically, have data and connection as a parameters. It’s much more easy to
operate with them on this level, but I’ve run into strange behaviour on the
network level.
That is all a bit foggy to me. Also, I still haven’t really grokked
what your program is supposed to do. Can you give a birds eye view?
Do you just prepare data and send it off or are you getting something
back as well?

If preparing the data for transfer is an expensive task (i.e. takes
some time) then the one connection per thread approach might be
inefficient since the connection will sit there idly most of the time
yet use some resources on the server side. In those situations it may
be worthwhile to use a connection pool.
Preparing data was really expensive before I rewrote this particular part of the
code in inline C. Now it takes just about 2 seconds per few hundred megabytes of
binary data. Still expensive, but in almost all scenarios bottleneck will be
network, not the data preparation.

If data preparation is so cheap then maybe a ratio of 2:1 for threads
to connections is in order. Then on average two threads “share” a
connection and one thread can be preparing the data while the other
one is using the connection to send.

Thank you for your input. I will look into improving my code according to your
suggestions.

You’re welcome!

Kind regards

robert

Tad_D · November 25, 2013, 6:12pm

Please trim your quotes - that makes it easier for readers to follow
the discussion.

On Sun, Nov 24, 2013 at 6:32 PM, Tad D. [email protected]
wrote:

On 24 Nov 2013, at 17:29, Robert K. [email protected] wrote:

I will explain the whole concept in layman’s terms. Please excuse me, if some of
this wasn’t necessary.
Usenet - internet’s predecessor is like one big forum. You can post, you can
read. Everything is done via NNTP protocol. To post binary data, it needs to be
converted to ASCII first(8-bit). The most common encoding method is yEnc.
So, my program is supposed to encode the given files and post them, where they
should be and be as efficient at this as it is possible. Because of efficiency
part, I have switched to inline C for encoding. But now I’ve hit another wall -
5MB/s is the limit I am able to achieve with my program(and I see these strange
drops too). It is puzzle which I am planning to solve.

I achieve 2.4 MB/s with a pure Ruby encoder.

Communication between program and server is almost one way. Almost, because I
need to receive server’s responses. Otherwise I won’t know if posting was
successful.

Unfortunately yEnc does not have a uniform output size so you cannot
divide the encoding step itself. Encoding in parallel works only for
multiple files.

If preparing the data for transfer is an expensive task (i.e. takes
some time) then the one connection per thread approach might be
inefficient since the connection will sit there idly most of the time
yet use some resources on the server side. In those situations it may
be worthwhile to use a connection pool.
Preparing data was really expensive before I rewrote this particular part of
the code in inline C. Now it takes just about 2 seconds per few hundred megabytes
of binary data. Still expensive, but in almost all scenarios bottleneck will be
network, not the data preparation.

What will you gain than from having multiple connections to a host?
Are you expecting to get around traffic shaping per connection by
using multiple connections?

If data preparation is so cheap then maybe a ratio of 2:1 for threads
to connections is in order. Then on average two threads “share” a
connection and one thread can be preparing the data while the other
one is using the connection to send.
I am currently using only one thread(although with priority slightly above
normal) to prepare data. I think that this is sufficient. All other threads are
there only to fetch the data and upload them in most efficient manner. I know that
I’ll be remodelling my upload threads, I just don’t know how.

This looks like a two stage approach

encode
transfer

Output of stage one would be one chunk for upload and transfer and a
writer would start writing it as soon as it is available. You’d have
a thread pool for the writers where each thread has its own connection
(if all messages go to the same target). Overall efficiency will
depend on a wisely chosen chunk size. If it is too large you waste
too much time waiting initially (also might have some GC issues), if
it is too small per message overhead bogs you down. Ideally you make
that a parameter of the algorithm so you can easily experiment.

Kind regards

robert

Tad_D · November 27, 2013, 12:56pm

On Wed, Nov 27, 2013 at 9:59 AM, Tad D. [email protected]
wrote:

On 25 Nov 2013, at 18:12, Robert K. [email protected] wrote:

I achieve 2.4 MB/s with a pure Ruby encoder.
With pure ruby encoder the bottleneck is in encoding speed and it’s CPU
depended. I’ve experimented with parallel encoder(encoding done by spawning
multiple processes). Result was a little bit better, but still not as good as with
C.

Yes, question usually is: is it fast enough. If limiting factor is IO
then encoding speed may be negligible.

Unfortunately yEnc does not have a uniform output size so you cannot
divide the encoding step itself. Encoding in parallel works only for
multiple files.
Chunk size is known in advance. It’s an option, which is set by a user. You can
split original file in chunks and nothing stops you from starting parallel
encoding for this chunks. It’s not limited to files.

I was thinking of the output chunk size. If you are allowed to pick
the input chunk size then you’re right, of course.

What will you gain than from having multiple connections to a host?
Are you expecting to get around traffic shaping per connection by
using multiple connections?
Multiple connections are a common solution to maximising upload(or download)
rate. It’s helpful not only against traffic shaping(and not all traffic shaping
could be circumvented by it) and normally I am getting 20-50% rate increase
because of multiple connections to a server.

Understood. I think it’s good to have # of connections per server (are
there more than one btw.?) as another parameter of the algorithm.

it is too small per message overhead bogs you down. Ideally you make
that a parameter of the algorithm so you can easily experiment.

I didn’t think about experimenting with chunk size. It’s really a wonderful
idea. Of course, upper limit is normally set by a server(normally it’s limited to
few megabytes), but maybe I’ve chosen the wrong default chunk size, because I’ve
blindly followed conventions. I need to check this.

I’m glad I could help at least a bit.

Kind regards

robert

Tad_D · November 27, 2013, 10:00am

On 25 Nov 2013, at 18:12, Robert K. [email protected]
wrote:

I achieve 2.4 MB/s with a pure Ruby encoder.
With pure ruby encoder the bottleneck is in encoding speed and it’s CPU
depended. I’ve experimented with parallel encoder(encoding done by
spawning multiple processes). Result was a little bit better, but still
not as good as with C.
Unfortunately yEnc does not have a uniform output size so you cannot
divide the encoding step itself. Encoding in parallel works only for
multiple files.
Chunk size is known in advance. It’s an option, which is set by a user.
You can split original file in chunks and nothing stops you from
starting parallel encoding for this chunks. It’s not limited to files.
What will you gain than from having multiple connections to a host?
Are you expecting to get around traffic shaping per connection by
using multiple connections?
Multiple connections are a common solution to maximising upload(or
download) rate. It’s helpful not only against traffic shaping(and not
all traffic shaping could be circumvented by it) and normally I am
getting 20-50% rate increase because of multiple connections to a
server.
too much time waiting initially (also might have some GC issues), if
it is too small per message overhead bogs you down. Ideally you make
that a parameter of the algorithm so you can easily experiment.

I didn’t think about experimenting with chunk size. It’s really a
wonderful idea. Of course, upper limit is normally set by a
server(normally it’s limited to few megabytes), but maybe I’ve chosen
the wrong default chunk size, because I’ve blindly followed conventions.
I need to check this.

Tad_D · November 27, 2013, 3:44pm

On 27 Nov 2013, at 12:55, Robert K. [email protected]
wrote:

Yes, question usually is: is it fast enough. If limiting factor is IO
then encoding speed may be negligible.
The answer would be “it depends”. It depends, if user is running the
program on some kind single core VM or on standard PC with multicore
CPU. Does user has ADSL with very limited upload bandwidth or some kind
of ultra-speed connection. There is no universal answer, but I am sure
that my current C implementation of yEnc encoder is fast enough for most
scenarios. I wasn’t sure about ruby. But it isn’t fast enough from
network perspective, because on 100Mbps line I am getting results which
look bad when compared to other nntp clients.

Understood. I think it’s good to have # of connections per server (are
there more than one btw.?) as another parameter of the algorithm.
There are multiple servers, but there can be only one server per config
file. Of course, there is a command line option to specify alternative
configs, in case user would like to connect to a different server with
different setting. User can also adjust number of connections via config
file. That’s why I am limiting the number of upload jobs based on
variable @threads, which is basically a parameter in .conf file.

I’m glad I could help at least a bit.

There is no need for modesty. You DID help a lot. I am rethinking the
upload algorithm because of your input. I have corrected and I am
correcting my ruby programming mistakes. Moreover - this discussion
provided me with some insightful thoughts. I am really thankful.