Drake: Distributed Rake

= DRAKE – Distributed Rake

A branch of Rake supporting parallel task execution.

== Synopsis

Run up to three tasks in parallel:

% drake -j3

or equivalently,

% drake --threads 3

== Installation

% gem install drake

== Notes

=== Compatibility

Drake is 100% compatible with Rake. The code path for --threads=1 is
effectively identical to that of Rake’s. Drake passes all of Rake’s
unit tests, with any number of threads from 1 to 1000 (that’s the most
I tested).

=== Dependencies

In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. Consider

task :a => [:x, :y, :z]

With single-threaded Rake, x,y,z will be invoked in that order
before a is invoked. However with drake --threads=N (for N > 1),
one should not expect any particular order of execution. Since there
is no dependency specified between x,y,z above, Drake is free to
run them in any order.

If you wish x,y,z to be invoked sequentially, then write

task :a => seq[:x, :y, :z]

This is shorthand for

task :a => :z
task :z => :y
task :y => :x

Upon invoking a, the above rules say: “Can’t do a until z is
complete; can’t do z until y is complete; can’t do y until x
is complete; therefore do x.” In this fashion the sequence
x,y,z is enforced.

The problem of insufficient dependencies plagues Makefiles as well.
Package maintainers affectionately call it “not j-safe.”

=== MultiTask

The use of +multitask+ is deprecated. Tasks which may properly be run
in parallel will be run in parallel; those which cannot, will not. It
is not the user’s job to decide.

Drake’s +multitask+ is an alias of +task+.

=== Task#invoke inside Task#invoke

Parallelizing code means surrendering control over the
micro-management of its execution. Manually invoking tasks inside
other tasks is rather contrary to this notion, throwing a monkey
wrench into the system. An exception will be raised when this is
attempted in non-single-threaded mode.

== Links

== Author

My first reaction was, “So hows is this different than Rake again? Rake
has
multitask…”

On Tuesday 09 September 2008 01:58:40 [email protected] wrote:

=== MultiTask

The use of +multitask+ is deprecated. Tasks which may properly be run
in parallel will be run in parallel; those which cannot, will not. It
is not the user’s job to decide.

Drake’s +multitask+ is an alias of +task+.

Aha.

So what do you do with things which aren’t thread-save? Or “j-safe”?

And how is this different than running Rake with ‘task’ set to an alias
of ‘multitask’?

On Sep 9, 4:52 am, David M. [email protected] wrote:

So what do you do with things which aren’t thread-save? Or “j-safe”?

The same thing you do with a Makefile that isn’t j-safe: (1) write the
dependencies correctly, which makes it j-safe, or (2) don’t run it
with -j.

And how is this different than running Rake with ‘task’ set to an alias
of ‘multitask’?

If ‘task’ became ‘multitask’, Rake would run all your tasks at once –
all at the same time. That’s probably not what you want :slight_smile:

Thinking in terms of parallel execution has a little learning curve.
It’s certainly a not a natural transition coming from single-threaded
thinking.

Incidentally there is a good litmus test for determining whether you
get the gist of parallelism: once it becomes obvious that ‘multitask’
is a mistake, then you probably get it. The dependency graph tells us
what can be run in parallel and what can’t. It’s a math problem.
‘multitask’ stomps it all to pieces, having the power to declare 2 + 5
= 8 if it so chooses.

My advice for Rakefile writers is to incrementally move toward -j
correctness. Start with the bottom tasks first (those executed last)
and work your way up, testing each new task subtree.

Regards,
J

On Tuesday 09 September 2008 05:03:43 [email protected] wrote:

On Sep 9, 4:52 am, David M. [email protected] wrote:

So what do you do with things which aren’t thread-save? Or “j-safe”?

The same thing you do with a Makefile that isn’t j-safe: (1) write the
dependencies correctly, which makes it j-safe, or (2) don’t run it
with -j.

Still going to be a fair number of cases of (3), I imagine: use locks to
synchronize non-thread-safe libraries, for which there’s still a benefit
to
running those tasks in parallel.

If ‘task’ became ‘multitask’, Rake would run all your tasks at once –
all at the same time. That’s probably not what you want :slight_smile:

Actually, no, I assumed that ‘multitask’ only ran that specific task in
parallel.

Actually, I hadn’t thought about it thoroughly enough to realize that
this
wasn’t what was happening:

The dependency graph tells us
what can be run in parallel and what can’t.

I understand make -j, and I think I understand the difference with
multitask – if I understand it:

multitask :foo …
multitask :bar …

In the above example, will everything really run concurrently? I’d
assumed
that foo would run concurrently, and then bar would run concurrently.

In either case, I see what Drake is doing (real make -j behavior).
Thanks for
explaining this – it looks cool!

One more thing: I’m not sure what the best way to do this is, but I
think it
would still be useful to have the task/multitask dichotomy, for legacy
programs. Multitasks would operate as properly parallized Drake tasks.
Plain
old tasks would run in complete isolation, with the exception that if
they
invoke a multitask, that multitask (and all its remaining dependencies)
run
in j-parallized mode.

That would certainly break the purity of it, and it would be a bit more
work,
but I think it could be made to work. The benefit is, you could
translate an
existing project iteratively, without having to verify that the whole
thing
is correct, first.

On Sep 9, 2:58 am, [email protected] wrote:

Upon invoking a, the above rules say: “Can’t do a until z is
complete; can’t do z until y is complete; can’t do y until x
is complete; therefore do x.” In this fashion the sequence
x,y,z is enforced.

The problem of insufficient dependencies plagues Makefiles as well.
Package maintainers affectionately call it “not j-safe.”

Hmmm… this is not backward compatible. Things could go very badly
if I tried -j3 on my “badly” written Rakefiles.

May I make a suggestion? Have

task :a => [:x, :y, :z]

translate into the task :a => :z => :y => :x thing. And then

task :a => [[:x, :y, :z]]

Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra “j-array”.

And besides it sort of looks like parallel marks || x || :wink:

Other than this one thing, I say very nice work.

T.

== Installation

% gem install drake

% gem install drake
Successfully installed drake-0.8.1.11.0.1

% which drake
drake not found

I forgot to mention that there is a good reason for the gem-only
release. Despite outward appearances, Drake is internally the same as
Rake, down to using the same file names and top-level module named
‘Rake’. This is to make a mainline merge easier, if Jim decides to do
so. (The fork stems from the latest Rake repository.)

Since Rubygems installs each gem in separate directory, it it safe to
have Rake and Drake installed at the same time. However if you bypass
gems by executing drake’s install.rb, your rake will be the parallized
one.

I also forgot to thank Jim, who transitioned to github in order to
help me do this.

Thanks–
J

Anton Ivanov wrote:

== Installation

% gem install drake

% gem install drake
Successfully installed drake-0.8.1.11.0.1

% which drake
drake not found

% sudo chmod +x /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake
% /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake -j2
/var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake: invalid option – j
% /var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake --threads 2
/var/lib/gems/1.8/gems/drake-0.8.1.11.0.1/bin/drake: unrecognized option
`–threads’

On Sep 9, 1:54 pm, Trans [email protected] wrote:

before a is invoked. However with drake --threads=N (for N > 1),
task :a => :z

Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra “j-array”.

I thought someone else might notice it and elaborate but there is
another potential benefit of this notation. Eg.

 task :a => [[:x, :y, :z], [:m, :n], :r]

Where :x, :y, :z can be run in parallel, as can :m and :n, but the
groups must run one before the other.

T.

James M. Lawrence wrote:

In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. Consider

task :a => [:x, :y, :z]

With single-threaded Rake, x,y,z will be invoked in that order
before a is invoked.

Just to clarify: In standard rake x, y and z will be invoked by task a
in that order. However, that doesn’t provide any guarantees that they
will be executed in that order.

For example, consider the following additional dependencies:

task :x => :z

Then the code for z will be executed before task x.

The moral of the story is that depending upon ordering of dependencies
to determine the ordering of execution is a bug in standard rake too.
(its just more likely that the drake will make this kind of bug
manifest).

BTW, good job James.

unknown wrote:

If ‘task’ became ‘multitask’, Rake would run all your tasks at once –
all at the same time. That’s probably not what you want :slight_smile:

[…] It’s a math problem.
‘multitask’ stomps it all to pieces, having the power to declare 2 + 5
= 8 if it so chooses.

I’m not quite sure what you are saying here, but if you are trying to
imply that multitask does not honor dependencies in ordering, you are
incorrect. If there is dependency declared, then a task won’t run until
all of its dependencies have finished.

That being said, there is a known bug in multitask where failures in
dependencies are not properly transmitted to all dependent tasks. But I
don’t think you were refering to that.

– Jim W.

Does Drake properly clean up its children if it is aborted with SIGINT?
ISTR
multitask in rake leaving orphans running.

David M. wrote:

Actually, no, I assumed that ‘multitask’ only ran that specific task in
parallel.

Actually, multitask will run all of the tasks dependencies in parallel,
not the task itself.

– Jim W.

Thomas S. wrote:

On Sep 9, 1:54�pm, Trans [email protected] wrote:

before a is invoked. �However with drake --threads=N (for N > 1),
� �task :a => :z

Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra “j-array”.

I thought someone else might notice it and elaborate but there is
another potential benefit of this notation. Eg.

 task :a => [[:x, :y, :z], [:m, :n], :r]

Where :x, :y, :z can be run in parallel, as can :m and :n, but the
groups must run one before the other.

As stated before, assuming execution order amoung dependencies is a bug
even in standard rake. If groups of tasks need to be ordered in time,
declare a dependency. Anything else is just wrong.

– Jim W.

Thomas S. wrote:

On Sep 10, 7:59�am, Jim W. [email protected] wrote:

in that order. �However, that doesn’t provide any guarantees that they
(its just more likely that the drake will make this kind of bug
manifest).

Ah, so by design you consider it a bug. You could have fixed that from
day one by randomizing the order of the prerequisites. Now you have a
situation where many Rakefiles depend on that bug. So, why not turn
lemons into lemonade, and make this bug a feature?

I’m not sure what you are advocating here:

(1) Guarantee that rake will invoke the prerequisites in the defined
order? … we already do that (for standard non-tasking rake).

(2) Guarantee that rake will execute the prerequisites in the defined
order? … Can’t do that, prerequisite constraints elsewhere may
constrain the execution to be a different order.

(3) Declare that I don’t mind if you make unwarranted assumptions about
execution order? … Well, as long as you don’t file bug reports, I’m ok
with that.

More clarification on Rake terminology:

To execute a task means to execute any code blocks attached to the task
(i.e. the do/end part of a task).

To invoke a task means to make sure all the prerequisites for the task
have been invoked and then execute the task if it has not yet been
executed. A task invocation will not execute the task if it has already
been executed.

In standard rake, the order of dependencies only specifies the
invocation order, not the execution order. You never were able to
directly control execution order of tasks via the order of the
dependency list.

In moving to drake, what you lose is direct control over invocation
order. You never had direct control of execution order.

– Jim W.

On Tue, Sep 9, 2008 at 3:38 PM, . [email protected] wrote:

task :a => :x
task :a => :y

Yes or no? If a programmer wants them to mean different things, how
shall we accommodate him?

Yes! If you want it differently, you write in the ordering explicitly

task :a => :z
task :z => :y
task :y => :x

You have no reason to expect one operation to come before another if
there is not an explicit dependency chain between them

martin

On Sep 10, 7:59 am, Jim W. [email protected] wrote:

in that order. However, that doesn’t provide any guarantees that they
(its just more likely that the drake will make this kind of bug
manifest).

Ah, so by design you consider it a bug. You could have fixed that from
day one by randomizing the order of the prerequisites. Now you have a
situation where many Rakefiles depend on that bug. So, why not turn
lemons into lemonade, and make this bug a feature?

T.

On Wed, Sep 10, 2008 at 08:38:35AM +0900, . wrote:

On Sep 9, 11:25 am, Jos B. [email protected] wrote:

Does Drake properly clean up its children if it is aborted with SIGINT? ISTR
multitask in rake leaving orphans running.

I’m misremembering. SIGINT seems to work okay, it’s SIGTERM that leaves
orphaned children (with ppid 1) around with rake, presumably because it
doesn’t catch that signal. Same with drake (0.8.1.11.0.1)

task :default => name
}

Thanks, I tried drake &' folllowed by kill $!’ (using bash) and the
forked
drake children revert from ppid $! to ppid 1. A `killall drake’ is
required to
clean up.

On Tue, Sep 9, 2008 at 4:13 PM, . [email protected] wrote:

(and I should hope we all agree), then there is nothing which can save
us. We are forced to write our Rakefiles correctly. No backwards
compatibility mode is possible for threads>1.

A --file-order-implies-dependency flag might get us there in a lot of
cases, though of course there’s no general solution. Of more value
would be a lint tool that helps convert a rakefile into parallelisable
form.

martin

On Sep 10, 9:13 am, Jim W. [email protected] wrote:

execution order? … Well, as long as you don’t file bug reports, I’m ok
been executed.

In standard rake, the order of dependencies only specifies the
invocation order, not the execution order. You never were able to
directly control execution order of tasks via the order of the
dependency list.

And yet we can use the execution order in practice:

F = []
G = []

task :f do
F.replace([1,2,3])
end

task :g do
if F.empty?
G.replace([4,5,6])
else
G.replace(F)
end
end

desc “use f and g not defined by f”
task :g1 => [:g, :f] do
p G, F
end

desc “use f and g defined by f”
task :g2 => [:f, :g] do
p G, F
end

I understand that the formal design did not intend for this. But
implementation allows it.

Is it worth potentially breaking Rakefiles to prevent this sort of
thing (like drake -j2 or more does)? I’m not so sure. While one might
consider this Rakefile “bad design” because it doesn’t fit the
original formal notion, it nonetheless does what one would expect it
to do. I think I’d rather have that, than the potential for ambiguous
behavior.

T.