Threading Loops

I understood how to thread functions, but I don’t understand how to
implement it outside of that. I am trying to make a picture re-namer and
I want to thread the renaming part to speed it up.

pic_names = Dir["{E://**/*.{JPG, jpg}"]
pic_numb = 1;
batch_name = “test”

#Want to thread this part.
pic_names.each do |name|
print ’ . ’
new_name = batch_name + pic_numb.to_s + ’ .jpg’
File.rename name, new_name
pic_numb += 1
end

Thanks in advance.

It is very convenient to use Thread in Ruby, but maybe you should first
get rid
of pic_numb self-increment statements.

pic_names.collect.with_index do |name, pic_numb|
    Thread.new do
        print ' . '
        new_name = batch_name + pic_numb.to_s + ' .jpg'
        File.rename name, new_name
    end
end.each{ |thread| thread.join }

Roy

Thank you so much that helped allot and now I understand threading
better.

Could you explain the first part?
pic_names.collect.with_index do |name, pic_numb|

I understand how the threading is working but not the loop. I read
.collect returns all elements in an array. But what about the
with_index and adding pic_numb to the iterator.

Sorry for double posting just wanted to state I understand it now, used
irb tell I understood what was going on.

The threading made no deference in speed at all not what I expected.

Hi,

I don’t think threading will speed up your execution time in this case.
Threading is an illusion: processing switches rapidly between threads;
threads don’t actually execute at the same time. So unless there is
some waiting going on somewhere in your code, and switching to another
thread will make use of that downtime, then there won’t be any
improvement execution speed.

Even if you have a computer with multiple processors, it won’t help:

===
…if your machine has more than one processor, Ruby threads won’t take
advantage of that fact - because they run in one process, and in a
single native thread, they are constrained to run on one processor at a
time.

http://rubylearning.com/satishtalim/ruby_threads.html

File.rename() must make a system call to the operating system to rename
the file, and:

===
if some thread happens to make a call to the operating system that takes
a long time to complete, all threads will hang until the interpreter
gets control back.

…so even if there is some downtime waiting for the system call that
renames the file to return, processing won’t switch to another thread
during the downtime.

I can’t imagine that even forking would help you here—your OS will still
be performing only one I/O call at a time. Unless you’ve got an OS
that’s doing some funny optimization like aggregating filesystem
metadata changes, you won’t see any speedup. And that would be a strange
thing to optimize for.

Multiprocessing generally only helps when your application is CPU bound,
not I/O bound.

Bobby S. wrote in post #997304:

The threading made no deference in speed at all not what I expected.

Ok, that is what I expected. To truly do two things at once in ruby,
you have to create multiple processes. So you need to read up on
fork(). If you are on windows, you may be out of luck.

Yea I just went to irb and started playing with .collect and with_index
tell I figured out what it did since I couldn’t find any information on
with_index that I could understand.

I didn’t understand threading much, but I still wanted to know how to do
this for future reference so it helps even though it didn’t help if that
makes sense. Also your explanation helped allot.

Bobby S. wrote in post #997293:

Thank you so much that helped allot and now I understand threading
better.

Could you explain the first part?
pic_names.collect.with_index do |name, pic_numb|

I understand how the threading is working but not the loop. I read
.collect returns all elements in an array. But what about the
with_index and adding pic_numb to the iterator.

In ruby 1.9, if you call collect() without supplying a block, you get
what’s called an ‘enumerator’ back. It’s an object of the Enumerator
class, which has a method called with_index(). with_index() works
just like each(), but it sends two arguments to the block: the first
argument is an element of the array, and the second argument is that
element’s index in the
array.

I don’t like that collect() loop at all. collect() returns an array
containing elements of the original array for which the block evaluates
to true. But the only thing inside the block is Thread.new(), which
always returns something that evaluates to true, so all elements of the
original array are selected by collect() and returned in a new array,
which is then discarded because the result of collect() isn’t assigned
to a variable. So, why not just use each(), which also steps through
every element of an array, but doesn’t bother returning an array:

arr = [‘a’, ‘b’, ‘c’]

arr.each.with_index do |el, index|
p [el, index]
end

–output:–
[“a”, 0]
[“b”, 1]
[“c”, 2]

On Sat, May 7, 2011 at 6:56 PM, 7stud – [email protected] wrote:

Hi,

I don’t think threading will speed up your execution time in this case.
Threading is an illusion: processing switches rapidly between threads;
threads don’t actually execute at the same time.

Well, if you are using Ruby 1.8 with its green threads, or Ruby 1.9
and not calling native routines that release the global interpreter
lock. If you are using JRuby, threads are concurrent native threads
with no GIL, so they do run concurrently, and you should see a speedup
on tasks that are CPU-bound and efficiently parallelizable.

===
…if your machine has more than one processor, Ruby threads won’t take
advantage of that fact - because they run in one process, and in a
single native thread, they are constrained to run on one processor at a
time.

http://rubylearning.com/satishtalim/ruby_threads.html

This is true in MRI 1.8, which uses “green” threads implemented in the
runtime library with a single native thread backing them. Its not
true, from what I understan, in older versions of MacRuby, or Ruby
1.9, or the current versions of Rubinius (in all three of these,
threads are native threads, but concurrency is limited by a global
interpreter lock), current MacRuby or JRuby (threads are native
threads with no global lock), and I haven’t seen anything about
Maglev’s threading model.

While the Thread API is part of the Ruby language, threading
implementations vary between Ruby implementations.

On 08.05.2011 04:16, 7stud – wrote:

Bobby S. wrote in post #997304:

The threading made no deference in speed at all not what I expected.

Ok, that is what I expected. To truly do two things at once in ruby,
you have to create multiple processes. So you need to read up on
fork().

It’s questionable whether that will yield any benefit since it’s likely
not CPU what’s making it slow but rather IO. Depending on the file
system there might be some locking on the directory involved when
renaming. In any case a FS needs to take measures to not make disk
contents inconsistent in case there are multiple concurrent writes to a
directory. That’s where the bottleneck likely lies.

I don’t think that doing the rename concurrently will give any
improvements. I would just do it sequentially or at least not with so
many threads (at most 2). I don’t think it’s worth going through that
hassle.

Kind regards

robert

On 08.05.2011 04:54, 7stud – wrote:

In ruby 1.9, if you call collect() without supplying a block, you get
what’s called an ‘enumerator’ back. It’s an object of the class
Enumerator, which has a method called with_index(). with_index() works
just like each()–but it sends a second argument to the block: the index
of the element.

I don’t like that collect() loop at all. collect() returns an array
containing elements of the original array for which the block evaluates
to true.

It seems you are confusing #collect with #select here.

irb(main):006:0> a=[true,false,nil,1,2,3]
=> [true, false, nil, 1, 2, 3]
irb(main):007:0> a.collect {|x| x}
=> [true, false, nil, 1, 2, 3]
irb(main):008:0> a.select {|x| x}
=> [true, 1, 2, 3]
irb(main):009:0> a.collect {false}
=> [false, false, false, false, false, false]
irb(main):010:0> a.select {false}
=> []
irb(main):011:0> a.select {true}
=> [true, false, nil, 1, 2, 3]

#collect does the same as #map: it creates a new Array containing the
result of block evaluation on each element in the original Enumerable.

irb(main):012:0> a.collect {|x| x.inspect}
=> [“true”, “false”, “nil”, “1”, “2”, “3”]

But the only thing inside the block is Thread.new(), which
always returns something that evaluates to true, so all elements of the
original array are selected by collect() and returned in a new array,
which is then discarded because the result of collect() isn’t assigned
to a variable.

That’s not true either:

pic_names.collect.with_index do |name, pic_numb|
    Thread.new do
        print ' . '
        new_name = batch_name + pic_numb.to_s + ' .jpg'
        File.rename name, new_name
    end
end.each{ |thread| thread.join }

This creates a thread for each input and then joins on all of them.
It’s perfectly appropriate and even elegant to use #collect here.

irb(main):013:0> a.collect.with_index {|x,y| [x,y]}
=> [[true, 0], [false, 1], [nil, 2], [1, 3], [2, 4], [3, 5]]

With Threads:

irb(main):014:0> a.collect.with_index {|x,y| Thread.new {}}
=> [#<Thread:0x106a8278 run>, #<Thread:0x106a81d0 run>,
#<Thread:0x106a8160 run>, #<Thread:0x106a80f0 run>, #<Thread:0x106a8048
run>, #<Thread:0x106a7fbc run>]

Kind regards

robert

rename is almost certainly disk limited - doing them in parallel is
almost certainly not going to be faster. In any case, Ruby threads are
not true threads, only one executes at once (in MRI anyway, other
implementations like JRuby are different)