Drake: Distributed Rake

unknown · September 10, 2008, 9:20pm

On Sep 10, 2008, at 11:06 AM, Trans wrote:

Is it worth potentially breaking Rakefiles to prevent this sort of
thing (like drake -j2 or more does)? I’m not so sure. While one might
consider this Rakefile “bad design” because it doesn’t fit the
original formal notion, it nonetheless does what one would expect it
to do. I think I’d rather have that, than the potential for ambiguous
behavior.

They aren’t potentially broken, they are broken. If it happens to
work, you’ve just gotten lucky.

I’ve helped rework the Rubinius rakefiles twice and I can assure you
it’s perfectly possible to having broken rakefiles without -j. We
were able to use drake with only one change due to having working
rakefiles beforehand.

Furthermore, this is a feature that is not enabled by default. I
don’t see where this is an issue.

unknown · September 10, 2008, 9:27pm

Eric H. wrote:

We were able to use drake with only one change due to having working
rakefiles beforehand.

I’m anxious to try it, but as I wrote the gem seems not to work. How
did you do it?

unknown · September 11, 2008, 1:46am

On Sep 10, 2008, at 12:20 PM, Anton Ivanov wrote:

Eric H. wrote:

We were able to use drake with only one change due to having working
rakefiles beforehand.

I’m anxious to try it, but as I wrote the gem seems not to work. How
did you do it?

I evaluated a prerelease copy. I’ve not yet tried this release.

unknown · September 11, 2008, 2:52am

On Sep 10, 12:38 am, “Martin DeMello” [email protected] wrote:

They should be the same, but if we’re discussing legacy rakefiles
where people have implicitly relied on their being different…

I agree that there’s really no ‘right’ thing to do, though - either
you’ve specified your depgraph properly or you haven’t.

should? How is one correct and the other not? They are just
different behaviors. Ie. rake is the same as drake -j1.

The problem is that drake -j2 or more can royally screw a Rakefile not
written for it. Thus the “fix” is to remain backward compatible but
add a syntactical distinction for j-ready tasks. Then there is no
problem.

T.

unknown · September 10, 2008, 11:48pm

On Tue, Sep 9, 2008 at 6:13 PM, . [email protected] wrote:

On Sep 9, 7:32 pm, Martin DeMello [email protected] wrote:

A --file-order-implies-dependency flag might get us there in a lot of
cases, though of course there’s no general solution. Of more value
would be a lint tool that helps convert a rakefile into parallelisable
form.

But I thought we just agreed those two forms should be the same? Now
you are proposing a flag which will make them different.

They should be the same, but if we’re discussing legacy rakefiles
where people have implicitly relied on their being different…

I agree that there’s really no ‘right’ thing to do, though - either
you’ve specified your depgraph properly or you haven’t.

martin

unknown · September 11, 2008, 3:01am

On Sep 10, 3:13 pm, Eric H. [email protected] wrote:

work, you’ve just gotten lucky.
There’s no such thing as luck in computer programming.

T.

unknown · September 11, 2008, 4:51am

Jos B. wrote:

On Wed, Sep 10, 2008 at 08:38:35AM +0900, . wrote:

On Sep 9, 11:25ï¿½am, Jos B. [email protected] wrote:

Does Drake properly clean up its children if it is aborted with SIGINT? ISTR
multitask in rake leaving orphans running.

I’m misremembering. SIGINT seems to work okay, it’s SIGTERM that leaves
orphaned children (with ppid 1) around with rake, presumably because it
doesn’t catch that signal. Same with drake (0.8.1.11.0.1)

What would be a good way to fix this?

– Jim W.

unknown · September 11, 2008, 4:43am

Thomas S. wrote:

should? How is one correct and the other not?

Because one assumes a dependency that is not explicitly declared. Rake
only guarantees execution ordering in the face of explicit dependencies.

– Jim W.

unknown · September 11, 2008, 8:49pm

On Thu, Sep 11, 2008 at 11:45:11AM +0900, Jim W. wrote:

Jos B. wrote:

I’m misremembering. SIGINT seems to work okay, it’s SIGTERM that leaves
orphaned children (with ppid 1) around with rake, presumably because it
doesn’t catch that signal. Same with drake (0.8.1.11.0.1)

What would be a good way to fix this?

I’m probably being naive, but couldn’t rake catch SIGTERM and handle it
the
same way it appears to handle SIGINT?

unknown · September 11, 2008, 11:14pm

On Sep 10, 10:58 pm, “.” [email protected] wrote:

On Sep 10, 2:06 pm, Trans [email protected] wrote:

While one might consider this Rakefile “bad design” because it
doesn’t fit the original formal notion, it nonetheless does what one
would expect it to do. I think I’d rather have that, than the
potential for ambiguous behavior.

Underspecified dependencies + parallel execution == ambiguous behavior

They are only unspecified according to an interpretation of how things
ought to be. In the current implementation Rake is executing in a
predictable order. One can use it, and people have. Maybe not formally
ideal but the functionality is there. But that’s not whast really
concerns me. The issue I was looking at was:

drake -j2 + Rakefile = ambiguous behavior

So I was suggesting that it would perhaps be better to accept rake’s
current implementation behavior; this ambiguity would then not arise;
and instead provide another notation to indicate parallel execution.
My particular idea might not be the best one, I was just looking for a
possible solution that could be useful in itself and address this
issue. Another possibility is just placing a statement at the
beginning of a Rakefile that could be used to indicate that the
rakefile is in fact “j-able”.

I thought it prudent to address this b/c, personally, I’d like to see -
j end up in Rake itself. But perhaps it is better to just move forward
and expect people to fix all there old Rakefiles (and lets just hope
nothing really ugly happens when they haven’t).

There’s no such thing as luck in computer programming.

Yes, there is.

And his name is _why? Well, i suppose if we want to take chances,
then there is.

T.

unknown · September 11, 2008, 11:00pm

On Tuesday 09 September 2008 17:33:37 . wrote:

Still going to be a fair number of cases of (3), I imagine: use
locks to synchronize non-thread-safe libraries, for which there’s
still a benefit to running those tasks in parallel.

If by “non-thread-safe libraries” you mean a library whose Rakefile is
not j-safe, then you would just run it without -j.

Which is, by the way, one of the most irritating things about Makefiles.

While in the simple case, a Makefile author might not know about -j, and
write
a safe Makefile anyway (because that’s really simpler, after all), it’s
really troublesome that there’s no standard way to tell whether
something’s
j-safe or not.

Seems like the best I can do is run something with -j, and if it seems
to
work, well, hope for the best.

That’s one reason I like multitask – it forces the programmer to be
explicitly thinking about threading.

If it is a library
inside a larger project, you have at least two options:

(a) Run single-threaded rake in a subprocess for that library.

(b) Use the Rake module directly, as the unit tests do. The
no-invoke-inside-invoke rule applies per TaskManager, so you could
create a new TaskManager and do whatever you wish with it.

I was talking about an even simpler problem:

Let’s pretend, for a moment, that we’re talking to an HTTP library
that’s not
thread-safe. Our Rakefile, for whatever reason, needs to download stuff
and
then work with it. So we can’t use this HTTP library directly – we need
to
wrap synchronization around it.

But, there’s still an advantage to running the actual meat of the tasks
in
parallel – maybe we’re doing some complex hpricot parsing, and we’re
connecting to a potentially-slow server. Ideally, we want to download as
fast
as we can, but once it’s downloaded, we want to start crunching in
worker
threads.

So there’s still a benefit to Drake/Multitask, but there’s the added
complexity of having to wrap that non-thread-safe library.

Contrived example, I know.

That’s an advantage to single-threaded Rake, by the way – by default,
you
don’t need to think about any of this. (-j1 isn’t an excuse, unless the
Rakefile can force it, because then it’s up to the user to figure out
what
j-level to use. That should be transparent.)

unknown · September 12, 2008, 12:32am

Trans,

A given Rakefile that works for single-threaded use does not contain
the information necessary, neither explicitly nor implicitly through
the application of any set of rules, to construct a j-safe graph.

Regards,
James M. Lawrence

unknown · September 11, 2008, 11:17pm

On Tuesday 09 September 2008 17:38:40 . wrote:

There is a mathematical reality we cannot avoid, from which
special-case syntax and backwards-compatibility acrobatics cannot save
us. The problem is in our thinking. We didn’t specify what depends
on what. We thought we did, but it turns out we were fooling
ourselves all along.

That’s not always the problem. Given that Rake itself doesn’t guarantee
any
kind of ordering, we have to assume that dependencies are specified
correctly, or close to it.

But we’re not writing Erlang, which means spec-ing dependencies
correctly
isn’t enough.

while this
insufficiently define dependencies. This will not come close to
saving us.

No, but it does take us back to the behavior of Rake, or of Drake -j1.
If you
really want to provide bug-for-bug compatibility, dig into the Rake code
and
figure out what the ordering should be.

There is already a historical precedent with Makefiles. A new syntax
could have been added to Makefiles, but none was. The Makefiles had
bugs, but instead of timidly skirting around the problems while
praising the gods of backwards compatibility, people faced them
head-on, solving them one at at time.

Some did, yes.

And some let their Makefiles remain, with the existing syntax and bugs,
and
left it to their users to figure out whether they could be parellized or
not.

I’m sorry, but if you’re already asking me to manually run a rake task,
you
don’t get to also ask me to read the source code of your Rakefile and
figure
out whether or not it will work with -j2. Nor should I have to use trial
and
error, potentially with very subtle bugs, to figure out what’s happened.

And it’s worth mentioning again: We’re not writing Erlang, we’re writing
Ruby.

That means shared memory. It means locking issues. And it means
thread-unsafe
libraries.

It means that a Rakefile could very well crash if run with -j2.

Understand, I don’t mean it will be run in the wrong order, or that the
dependencies are wrong. The dependencies may well be perfect, and it
will run
exactly as designed to.

Except that at some point, two separate tasks will simultaneously do
something
a library won’t like, and that library will deadlock. Or segfault. Or
worse,
give corrupt data.

Which means that the Rakefile author is responsible, then, for fixing
the
deficiencies in the library. Or they have to contact the library author,
and
attempt to get the library fixed. Making every single Ruby library
thread-safe is a laudable goal, but also not going to happen.

You could solve a lot of that, I suppose, by forking instead – but that
introduces its own problems.

unknown · September 13, 2008, 4:46pm

I have restored the original ‘multitask’ for single-threaded mode
only. Now Drake and Rake should have functionally identical codepaths
for single-threaded mode (default behavior); my previous assertion of
such which wasn’t quite true.

I have also taken Thomas S.'s suggestion for a randomizing option
(credited in the ChangeLog).

New sections of the README:

=== Migrating to -j

First of all, do you want to bother with -j? If you are satisfied
with your build time, then there is really no reason to use it.

If on the other hand your build takes twenty minutes to complete, you
may be interested in investing some time getting the full dependency
tree correct in order to take advantage of multiple CPUs or cores.

Though there is no way for Drake to fathom what you mean by a
correct dependency, there is a tool available which helps you get
closer to saying what you mean.

% drake --rand[=SEED]

This will randomize the order of sibling prerequisites for each task.
When given the optional SEED integer, it will call srand(SEED) to
produce the same permutation each time. The randomize option also
disables +multitask+.

Though this option may produce an error due to an unspecified
dependency, at least it will be an error which is exactly the same on
each run (with SEED). In addition, you’ll have the major debugging
advantage of using a single thread.

This option will also work in multi-threaded mode. After all, once
-jN is running smoothly there is still no guarantee that you have it
right. However with each successful execution of drake -jN --rand,
the probability of correctness approaches 1 (though asymptotically
so).

(The only way to prove correctness is to test all such permutations,
which for any non-trivial project would be prohibitively large,
especially those which meaningfully benefit from -j.)

=== MultiTask

When more than one thread is given, +multitask+ behaves just like
+task+. Those tasks which may properly be run in parallel will be run
in parallel; those which cannot, will not. It is not the user’s job
to decide. In other words, for -jN (N > 1), +multitask+ is an alias
of +task+.

For -j1 (default), +multitask+ behaves as the original.