Ruby Threads question

The following code is a slight modification of the code found at the top
of page 136 in the book “Programming Ruby The Pragmatic Programmers’
Guide” by Dave T. with Chad F. and Andy H.

#!/usr/bin/ruby -w

require ‘net/http’

pages = %w(www.google.com www.slashdot.org www.mit.edu)
threads = []

for page_to_fetch in pages
threads << Thread.new(page_to_fetch) do |url|
h=Net::HTTP.new(url, 80)
puts “Fetching: #{url}”
resp = h.get(‘/’, nil)
puts “Got #{url}: #{resp.message}”
end
end

threads.each {|thr| thr.join }

How come when I run this code, the following output goes line by line?
Fetching: www.google.com
Fetching: www.slashdot.org
Fetching: www.mit.edu
Got www.slashdot.org: Moved Permanently
Got www.google.com: OK
Got www.mit.edu: OK

Shouldn’t the output be all done at once since the code is running in
parallel?

On 9/5/07, Cd Cd [email protected] wrote:

parallel?
I’m not sure what you mean “all done at once”.

If the code wasn’t running in parallel threads then I’d expect to see:

Fetching: www.google.com
Got www.google.com: OK
Fetching: www.slashdot.org
Got www.slashdot.org: Moved Permanently
Fetching: www.mit.edu
Got www.mit.edu: OK

What’s really happening is something like:

Thread 1 Thread 2
Thread 3
opens HTTP
prints Fetching:www.google.com
issues get
opens http
prints
Fetching: www.slashdot.org
issues get

                             prints Fetching: www.mit.edu

                             issues get
                                                  get finishes
                                                  prints Got

www.slashdot.org: Moved Permanently
get finishes
prints Got www.google.com: OK

                            get finishes

                            prints Got www.mit.edu: OK

So the output order is evidence that there is indeed parallel activity.

Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On Sep 5, 2007, at 8:50 PM, Cd Cd wrote:

Shouldn’t the output be all done at once since the code is running in
parallel?

what would that look like exactly though? :wink:

seriously there are more than one reason that cannot happen.

  1. the console being printed to can only allow one thing to write to
    it at a time. generally speaking the console is line buffered up to
    some limit so, as long as a program writes a chunk of chars less that
    that limit the lines will not appear to intermixed. things get more
    complicated when programs are not writing to the console, but to file
    instead.

  2. threads are never run in parallel from the perspective of the
    computer: the cpu can only run one program at once. it simply does
    very intelligent switching very quickly to make you think it’s doing
    more than one thing ;-). with ruby’s threads, which are known as
    ‘green’ threads, ruby itself does this switching for you. with
    ‘native’ threads the switching is done by the operating system. in
    either case the concept of ‘parallel’ is really from the perspective
    of the programmer. this isn’t strictly true as sometimes a program
    might be able to write to disk or the network but not need the cpu -
    in that case it might do two things at once. the central concept is
    basically that a thread is a programming abstraction for
    programmers to imagine themselves getting more done. sometimes
    it’s a useful one.

now, in a multi-cpu machine this gets even murkier - threads (native
ones not green ones) are actually very close to a whole process and
the operating system may in fact allow to bits of your program to run
on two cpus. in the end you must assume that only one piece of a
program is using a given piece of computer hardware at once, since we
simply cannot defy physics, but there are various approaches and
abstractions that help us let the computer help us, like threads, by
structuring our program such that the operating system might be
able to run two bit s of our code on two bits of hardware at once.

  1. threads are evil and best understood via meditation - not actual
    thinking.

kind regards.

a @ http://drawohara.com/

No two things can happen ‘at once’. It is only with limited perceptive
precision do they appear simultaneous.

Woah. How you drew it and how I mentally pictured it were two different
things.

Cd Cd wrote:

How come when I run this code, the following output goes line by line?
Fetching: www.google.com
Fetching: www.slashdot.org
Fetching: www.mit.edu
Got www.slashdot.org: Moved Permanently
Got www.google.com: OK
Got www.mit.edu: OK

You could probably explain this as follows:

Thread 1 goes for Google. Waiting for response is a blocking call, so it
yields.
Thread 2 goes for Slashdot. Waiting for response is a blocking call, so
it yields.
Thread 3 goes for MIT. Waiting for response is a blocking call, so it
yields.

From there the order largely depends on which response comes back first
and which thread get scheduled next. Given that Ruby uses only Green
threads that must generally reach an appropriate point to switch, this
seems perfectly logical.

I would also expect the result would remain mostly the same for those
first three lines, and mostly random for the next three lines, as long
as you run under Ruby 1.8. Other implementations that use all native
threads will be largely unpredictable for any sequence. When I tested
this code under JRuby, it produced a different sequence every time (and
required joining on the threads so they’d run to completion).

  • Charlie