On Tuesday 15 December 2009 07:46:26 pm Piyush R. wrote:
Consumer thread should remain the same as the example.
Except that this removes any advantage of threads, as I see it – and it
still
doesn’t remove the essential problem of shared memory, once you start
using it
for anything more complex than IDs.
And there is an advantage to doing this status update in a thread, I
just
don’t think it’s worth the potential headaches.
The main advantage of the Queue is the same advantage of a Unix pipe.
For
example, I now have some large files compressed with lzop initially,
which I
now want to recompress with lzma. It’s not going to make a huge amount
of
difference to do it this way (since lzma is so much slower than lzop),
but
consider:
lzop -d < foo.lzo | lzma -v9 > foo.lzma
In this case, the lzop decompression and the lzma compression are
actually
asynchronous – the lzop will decompress as fast as it can until it
fills up a
buffer, which lzma will then read from. On a modern dual-core Linux
system,
lzma will likely fill one core, while lzop uses some small-ish amount of
the
second core.
Technically, this would be a fixed-length Queue – the standard Queue
class
can actually grow indefinitely. But the idea is the same, and it is
concurrent
for the same reason.
There are generally two reasons you’d want to do something concurrently.
One
is because it’s more efficient – I have a dual-core machine. But this
only
makes sense if you’re using JRuby – MRI (1.8 and 1.9) is crippled by a
GIL,
just like Python.
The other reason is because you’ve actually got some concurrent things
happening, and sometimes pre-emptive multitasking is just easier to wrap
your
head around. For example, in this case, it’d be useful to have a
separate
thread if you wanted to be sure it was updating at a certain rate (once
a
second, or ten times a second), regardless of how fast (or how slowly)
the
actual data was changing.
One reason you might want to do this here is to provide an ETA – if the
original process stalls on a single ID for several minutes, you could
watch
the ETA get longer and longer.
But if you’re blocking waiting for ids to come off the Queue, this is no
advantage at all – it’s more or less the same as if you called some
“update_status” method with a given id.
It also introduces all the messiness of threading as soon as you start
passing
any more complex object than an id. For example, it might be nice if you
could
determine how far done the object was based on some internal state.
Ignoring
the fact that you’d only wake up to do this when you get a new id coming
down
the Queue, you now have the issue of the same object being accessed by
the
worker thread (doing whatever crunching it’s doing) and the view thread
(reading that object to figure out how done it is).