Rake task dependeny vs. method call

casper_the_ghost · April 25, 2006, 12:05pm

In Rake, what’s the signifficant difference between

task :strike => [ :coil ] do
# …
end

and

task :strike do
coil
#…
end

Thanks,
T.

casper_the_ghost · April 25, 2006, 1:22pm

On Tue, 2006-04-25 at 19:01 +0900, TRANS wrote:

#...

end

For one, dependencies are known (and managed) by rake, while method
calls are just method calls. Rake will always try to order the tasks run
so that all dependencies are fulfilled before a task is run, and will
only run a dependency task once. E.g. try this:

== Rakefile

task :default => :allstrike

def coil
puts “coiling by method”
end

task :coil do
puts “coiling task…”
end

task :strike => [ :coil ] do
puts “striking…”
end

task :strike2 => [ :coil ] do
puts “striking again”
end

task :mstrike do
coil
puts “mstriking”
end

task :mstrike2 do
coil
puts “mstriking again”
end

task :allstrike => [:coil, :strike, :strike2, :mstrike, :mstrike2]
END

Which you can then play with to see the differences.

$ rake strike strike2
(in /home/rosco/dev/ruby/raketest)
coiling task…
striking…
striking again

$ rake mstrike mstrike2
(in /home/rosco/dev/ruby/raketest)
coiling by method
mstriking
coiling by method
mstriking again

$ rake allstrike
(in /home/rosco/dev/ruby/raketest)
coiling task…
striking…
striking again
coiling by method
mstriking
coiling by method
mstriking again

Also, dependencies really come in handy when you throw file tasks into
the mix.

Hope that helps,

casper_the_ghost · April 25, 2006, 2:37pm

Ross B. <rossrt roscopeco.co.uk> writes:

For one, dependencies are known (and managed) by rake, while method
calls are just method calls. Rake will always try to order the tasks run
so that all dependencies are fulfilled before a task is run, and will
only run a dependency task once. E.g. try this:

Also, dependencies really come in handy when you throw file tasks into
the mix.

Hope that helps,

Yep. That does the trick. Knew I was missing a vital concept here: run
once.
Makes all the difference.

Thanks,
T.

casper_the_ghost · April 26, 2006, 2:48am

I was giving this some more thought and it occured to me that the
one-time run
of tasks isn’t any different from them being memoized methods. If that
is indeed
the case, then the only difference in the metadata. And while that’s
nice and
all, I doubt that’s all that vital.

T.

casper_the_ghost · April 26, 2006, 3:16am

Just for fun I put together this simple demonstration.

— task.rb

require ‘facet/module/memoize’

module Task

Store the dependencies.

def task_dependencies
@task_dependencies ||= {}
end

Store the decriptions.

def task_descriptions
@task_descriptions ||= {}
end

Set the description of the subsequent task.

def desc( line )
@last_description = line.gsub("\n",’’)
end

Define a task.

def task( args, &action )

if Hash === args
  raise ArgumentError, "#{args.size} for 1" if args.size != 1
  name, deps = *(args.to_a.flatten)
  name = name.to_sym
  deps = [deps].compact.collect{ |e| e.to_sym }
else
  name, deps = args.to_sym, []
end

task_dependencies[ name ] = deps
task_descriptions[ name ] = @last_description
@last_description = nil

(class << self; self; end).class_eval do

  define_method( name ) do
    deps.each{ |d| send(d) }
    action.call
  end

  memoize name

end

end

include Task

desc “Milk the cow!”
task :milk do
puts “milk”
end

desc “Jump over the moon!”
task :jump => [ :milk, :milk ] do
milk; milk
puts “jump”
end

jump

puts task_descriptions[:milk]
puts task_descriptions[:jump]

casper_the_ghost · April 26, 2006, 3:47am

On Wed, 26 Apr 2006, Trans wrote:

Just for fun I put together this simple demonstration.

check out tsort. you need it to work if you have a diamond like

b => a
c => a
d => b
d => c

else you’ll run ‘a’ twice. tsort let’s you build a dag of dependancies
and
even detects cycles.

cheers.

-a

casper_the_ghost · April 26, 2006, 7:21am

Joel VanderWerf wrote:

Here’s a slightly modified version of T’s code (#cache happens to be the
name of #memoize in my libs). The output is:

It does the right thing on cycles, too:

task :a => :b do puts “a” end
task :b => :c do puts “b” end
task :c => :a do puts “c” end

Output:

c
b
a

casper_the_ghost · April 26, 2006, 7:15am

[email protected] wrote:

else you’ll run ‘a’ twice. tsort let’s you build a dag of dependancies and
even detects cycles.

cheers.

-a

He (OP) already did the t-sort, implicitly.

Here’s a slightly modified version of T’s code (#cache happens to be the
name of #memoize in my libs). The output is:

a
b
c

require ‘cache’

module Task

Store the dependencies.

def task_dependencies
@task_dependencies ||= {}
end

Store the decriptions.

def task_descriptions
@task_descriptions ||= {}
end

Set the description of the subsequent task.

def desc( line )
@last_description = line.gsub("\n",’’)
end

Define a task.

def task( args, &action )

if Hash === args
  raise ArgumentError, "#{args.size} for 1" if args.size != 1
  name, *deps = *(args.to_a.flatten)
  name = name.to_sym
  deps = deps.compact.collect{ |e| e.to_sym }
else
  name, deps = args.to_sym, []
end

task_dependencies[ name ] = deps
task_descriptions[ name ] = @last_description
@last_description = nil

(class << self; self; end).class_eval do

  define_method( name ) do
    deps.each{ |d| send(d) }
    action.call if action
  end

  cache name

end

end

include Task

task :a do puts “a” end
task :b => :a do puts “b” end
task :c => :a do puts “c” end
task :d => [:b, :c]

d

casper_the_ghost · April 26, 2006, 7:49am

Trans wrote:

Just for fun I put together this simple demonstration.

This is kind of fun. Your demo suggests that maybe rake and dependency
injection can be unified somehow… For example, using my favorite DI
toy:

require ‘mindi’

class Tasks
extend MinDI::Container

a { puts “a”; :done }
b { a; puts “b”; :done }
c { a; puts “c”; :done }
d { b; c; puts “d”; :done }
end

Tasks.new.d

Output:

a
b
c
d

(The “:done” could be removed if MinDI didn’t confuse a nil return value
with the state of not yet having been called.)

casper_the_ghost · April 26, 2006, 7:15pm

On Thu, 27 Apr 2006 [email protected] wrote:

task :b => :a do y “b” => uidgen(42) end
b: 15

a: 5

c: 3
— d

i realized as soon as i hit send this error. the dag run should be
looked up
from the global dat (TH in tasklib2) so a does not get run more than
once.
like i said, my impl wasn’t perfect but, by using tsort, we can indeed
enforce
only the declared call graph.

sorry for the mistake.

cheers.

-a

casper_the_ghost · April 26, 2006, 7:09pm

On Wed, 26 Apr 2006, Joel VanderWerf wrote:

He (OP) already did the t-sort, implicitly.

not really, what it does is a kind of
global-short-circuit-based-on-dependancy-sort. consider:

 harp:~ > cat a.rb
 require 'tasklib'

 task :uidgen do |*argv| (seed = argv.shift) ? rand(seed) :

$UID||=uidgen(42) end

 task :a => :uidgen do y "a" => $UID end
 task :b => :a do y "b" => uidgen(42) end
 task :c => :a do y "c" => uidgen(42) end
 task :d => [:b, :c] do y "d" end

 d()


 harp:~ > ruby a.rb
 ---
 a: 12
 ---
 b: 12
 ---
 c: 12
 --- d

so this fails horribly (rand is called only a single time) because the
topology is determined via a mechanism (short circuit caching) in
addition and
outside of the defined relationships. in otherwords the topological
sort
should only enable the call graph not prevent all routes through
it.
otherwise it’s way too fragile and any outside calls can break the chain
as
this contrived example shows.

as far as i can tell it also fails to detect cycles:

harp:~ > cat b.rb
require ‘tasklib’

task :b => :a and task :a => :b

b()

harp:~ > ruby b.rb
./tasklib.rb:21:in a': stack level too deep (SystemStackError) from (eval):8:ina’
from ./tasklib.rb:21:in b' from ./tasklib.rb:21:inb’
from (eval):8:in b' from ./tasklib.rb:21:ina’
from ./tasklib.rb:21:in a' from (eval):8:ina’
from ./tasklib.rb:21:in b' ... 2219 levels... from ./tasklib.rb:21:inb’
from ./tasklib.rb:21:in b' from (eval):8:inb’
from b.rb:5

but by redefining tasklib.rb to use tsort both task lists function
correctly:

 harp:~ > cat a.rb
 require 'tasklib2'

 task :uidgen do |*argv| (seed = argv.shift) ? rand(seed) :

$UID||=uidgen(42) end

 task :a => :uidgen do y "a" => $UID end
 task :b => :a do y "b" => uidgen(42) end
 task :c => :a do y "c" => uidgen(42) end
 task :d => [:b, :c] do y "d" end

 d()


 harp:~ > ruby a.rb
 ---
 a: 5
 ---
 b: 15
 ---
 a: 5
 ---
 c: 3
 --- d

and cycles are detected at declaration time:

 harp:~ > cat b.rb
 require 'tasklib2'

 task :b => :a and task :a => :b

 b()


 harp:~ > ruby b.rb
 /home/ahoward//lib/ruby/1.8/tsort.rb:152:in `tsort_each':

topological sort failed: [:b, :a] (TSort::Cyclic)
from /home/ahoward//lib/ruby/1.8/tsort.rb:183:in
each_strongly_connected_component' from /home/ahoward//lib/ruby/1.8/tsort.rb:219:ineach_strongly_connected_component_from’
from /home/ahoward//lib/ruby/1.8/tsort.rb:182:in
each_strongly_connected_component' from /home/ahoward//lib/ruby/1.8/tsort.rb:180:ineach_strongly_connected_component’
from /home/ahoward//lib/ruby/1.8/tsort.rb:148:in tsort_each' from /home/ahoward//lib/ruby/1.8/tsort.rb:135:intsort’
from ./tasklib2.rb:35:in `task’
from b.rb:3

i’m not saying my impl is perfect - just that using tsort ensures only
the
topology declared is enforced and that no other implied topology created
by
short circuiting operations via caching prevents ‘normal’ routes through
this
call graph.

here are both impls:

harp:~ > cat tasklib2.rb
require ‘cache’
require ‘yaml’
require ‘tsort’
module Task
class TSortHash < Hash
include TSort
alias_method ‘tsort_each_node’, ‘each_key’
def tsort_each_child(node, &block) fetch(node).each(&block) end
def initialize(hash={}) update hash end
def super || [] end
def fetch(key) super rescue [] end
end

 TH = TSortHash.new

 def task_dependencies() @task_dependencies ||= {} end
 def task_descriptions() @task_descriptions ||= {} end
 def desc(line) @last_description = line.gsub("\n",'') end

 def task(args, &action)
   if Hash === args
     raise ArgumentError, "#{args.size} for 1" if args.size != 1
     name, *deps = *(args.to_a.flatten)
     name = name.to_sym
     deps = deps.compact.collect{ |e| e.to_sym }
   else
     name, deps = args.to_sym, []
   end

   # ensure no cycles in this graph and extract call graph
   th = TSortHash.new name => deps
   dag = th.tsort

   # ensure no cycles in global graph
   TH.update(th).tsort

   task_dependencies[ name ] = deps
   task_descriptions[ name ] = @last_description
   @last_description = nil

   (class<<self;self;end).module_eval do
     define_method( name ) do |*__a__|
       dag.each{|d| send(d) unless name == d }
       action.call(*__a__) if action
     end
   end
 end
 class ::Object; include Task; end

end

harp:~ > cat tasklib.rb
require ‘cache’
require ‘yaml’
module Task
def task_dependencies() @task_dependencies ||= {} end
def task_descriptions() @task_descriptions ||= {} end
def desc(line) @last_description = line.gsub("\n",’’) end
def task(args, &action)
if Hash === args
raise ArgumentError, “#{args.size} for 1” if args.size != 1
name, deps = (args.to_a.flatten)
name = name.to_sym
deps = deps.compact.collect{ |e| e.to_sym }
else
name, deps = args.to_sym, []
end
task_dependencies[ name ] = deps
task_descriptions[ name ] = @last_description
@last_description = nil
(class<<self;self;end).module_eval do
define_method( name ) do |a|
deps.each{ |d| send(d) }
action.call(a) if action
end
cache name
end
end
class ::Object; include Task; end
end

harp:~ > cat cache.rb
class Object
def singleton_class() class<<self;self;end end
def cache m = nil
if m
(Module === self ? self : singleton_class).module_eval <<-code
alias_method ‘#{ m }’, ‘#{ m }’
def #{ m }(a,&b)
c = cache[’#{ m }’]
k = [a,b]
if c.has_key? k
c[k]
else
c[k] = #{ m }(a,&b)
end
end
code
end
@cache ||= Hash::new{|h,k| h[k]={}}
end
end

kind regards.

-a

casper_the_ghost · April 26, 2006, 7:30pm

On Thu, 27 Apr 2006 [email protected] wrote:

task :b => :a do y “b” => uidgen(42) end
b: 15

a: 5

c: 3
— d

a sample fix:

harp:~ > cat a.rb
require ‘tasklib2’

task :uidgen do |*argv| (seed = argv.shift) ? rand(seed) :
$UID||=uidgen(42) end

task :a => :uidgen do y “a” => $UID end
task :b => :a do y “b” => uidgen(42) end
task :c => :a do y “c” => uidgen(42) end
task :d => [:b, :c] do y “d” end

make

harp:~ > ruby a.rb

a: 24

b: 25

c: 31
— d

harp:~ > cat tasklib2.rb
require ‘cache’
require ‘yaml’
require ‘tsort’
module Task
class TSortHash < Hash
include TSort
alias_method ‘tsort_each_node’, ‘each_key’
def tsort_each_child(node, &block) fetch(node).each(&block) end
def initialize(hash={}) update hash end
def super || [] end
def fetch(key) super rescue [] end
end
def task(args, &action)
if Hash === args
raise ArgumentError, “#{args.size} for 1” if args.size != 1
name, deps = (args.to_a.flatten)
name = name.to_sym
deps = deps.compact.collect{ |e| e.to_sym }
else
name, deps = args.to_sym, []
end
# ensure no cycles in global graph
dag.replace th.update(name => deps).tsort
(class<<self;self;end).module_eval do
define_method(name){|a| action.call(a) if action}
end
end
def th() @th ||= TSortHash.new end
def dag() @dag ||= [] end
def make() @dag.each{|d| send d} end
class ::Object; include Task; end
end

sorry for noise.

-a

casper_the_ghost · April 26, 2006, 7:58pm

I’m really confused about the rest of the examples (it looks to me like
task uidgen is sometimes acting as a “singleton service” and sometimes
as a “parametric service”), but I think I understand this one:

[email protected] wrote:

harp:~ > ruby b.rb
from ./tasklib.rb:21:in b' from ./tasklib.rb:21:inb’
from (eval):8:in `b’
from b.rb:5

This means that we are just using different code

Here’s mine (pls. excuse the module_eval string argument, it is very old
code), with a stripped down version of T’s task code:

module Task
def cache(method_name)
module_eval %{
alias :_compute#{method_name} :#{method_name}
def #{method_name}
if @#{method_name}_cached
@#{method_name}
else
@#{method_name}_cached = true
@#{method_name} = _compute#{method_name}
end
end
}
end

def task( args, &action )
if Hash === args
raise ArgumentError, “#{args.size} for 1” if args.size != 1
name, *deps = *(args.to_a.flatten)
name = name.to_sym
deps = deps.compact.collect{ |e| e.to_sym }
else
name, deps = args.to_sym, []
end

(@task_dependencies||={})[ name ] = deps

(class << self; self; end).class_eval do
  define_method( name ) do
    deps.each{ |d| send(d) }
    action.call if action
  end
  cache name
end

end
end

include Task

task :b => :a do puts “b” end
task :a => :b do puts “a” end

b()

Output:

a
b

casper_the_ghost · April 26, 2006, 8:17pm

Joel VanderWerf wrote:

b
I guess, strictly speaking, this isn’t the correct behavior for
dependencies, either. (Unless you are in a situation where you don’t
care so much about order as about the set of tasks which get done, and
about doing them at most once.)

So Ara’s use of TSort would be a good way to check for correctness at
“compile” time. Another way (checkin at “run” time) would be to have
three states for each task: {not_run_yet, running, done}, instead of
just two…

casper_the_ghost · April 26, 2006, 8:27pm

Thanks Ara and Joel. I see now that I have to take more care use tsort
or equiv.
to get the dependencies right.

I wonder though, might this task “pattern” merit low-level support in
Ruby. I
don’t mean in Ruby source neccessarily, I just mean that #task might be
generally useful in classes and modules, like implemented here, as
opposed to
being relegated to use in Rake only.

T.

casper_the_ghost · April 26, 2006, 8:14pm

On Thu, 27 Apr 2006, Joel VanderWerf wrote:

I’m really confused about the rest of the examples (it looks to me like
task uidgen is sometimes acting as a “singleton service” and sometimes
as a “parametric service”), but I think I understand this one:

yes. basically what i’m showing is that by implementing the tsort on
top of a
cache it fails if any task is called from outside the dependancy graph
it can
potentially break the call chain. make sense?

   if @#{method_name}_cached
if Hash === args
(class << self; self; end).class_eval do

task :b => :a do puts “b” end
task :a => :b do puts “a” end

b()

ah. i see. that a bit limiting though eh? we can’t even have tasks
like
this then

task :c do |path| IO.read(path) end

or, if we could, the caching would break.

essentially what i was saying is that if your caching is more general,
meaning
it’s also based on method arguments, and you sort implicitly by relying
on
such a cache, the call graph explodes if any code anywhere calls a task
externally or internally with the arguments that cause a collision in
the
cache hash. maybe i’m making it too hard but it seems like an
unreasonable
constraint that tasks cannot call other tasks outside the auto-derived
dependancy graph.

anyhow - i think trans’ point was that this is interesting and there i
think
we all agree.

cheers.

-a

casper_the_ghost · April 26, 2006, 11:09pm

Joel VanderWerf [email protected] writes:

Trans wrote:

Just for fun I put together this simple demonstration.

This is kind of fun. Your demo suggests that maybe rake and dependency
injection can be unified somehow… For example, using my favorite DI toy:

Funnily, it’s exactly the way I’ve implemented my experimental Rake
clone in Forth. Enjoy:

#! /usr/bin/env gforth

\ Fake - Forth Make
\ A cheap clone of (proto-)Rake.

\ Copyright (C) 2006 Christian N. [email protected]
\ Licensed under the same terms as Rake. (MIT-style)

\ Missing: - incremental task building
\ - file tasks

\ For file tasks, one would need to make the trigger big enough to
\ keep a file name and the mtime. This is left as an exercise for the
\ reader.

\ A version of sh that is compilable. (Rather useless without
\ variable interpolation.)
:noname
sh ;
:noname
35 parse postpone sliteral
postpone 2dup postpone type postpone cr \ Display command
postpone system ; \ Run command
interpret/compile: sh$

\ Late binding
: $ ( name – )
parse-word postpone sliteral postpone evaluate ; immediate

variable last-trigger \ The address of the last defined
trigger.

: new-trigger ( – ) \ Allocate, zero and store a new
trigger.
here cell allot last-trigger !
0 last-trigger @ ! ;

: trigger ( – a ) \ Compile the address of the current
trigger.
last-trigger @ postpone literal ; immediate

: trigger! ( a – ) \ Set the retrieved trigger.
1 swap ! ;

: triggered? ( a – ? ) \ Task run?
@ 0= ;

: task: ( name – ) \ Define a new task
new-trigger
:
postpone trigger
postpone triggered?
postpone IF
postpone trigger
postpone trigger!
; immediate

: ;task ( – ) \ End of task
postpone THEN
postpone ;
; immediate

: run-tasks ( – )
argc @ 2 > IF
argc @ 2 DO
i arg evaluate \ Call all tasks.
LOOP
ELSE
." fake: No task given." cr
THEN ;

: load-fakefile ( – )
TRY
s" Fakefile" included
RECOVER
." fake: No Fakefile found." cr
bye
ENDTRY ;

load-fakefile
run-tasks

bye

An example fakefile:

task: bla
$ quux
." hey, bla!" cr
;task

task: quux
$ bla
." hey, quux!" cr
;task

task: a $ b
sh$ echo a
;task
task: b $ a $ c $ d
sh$ echo b
;task
task: c $ b $ a
sh$ echo c
;task
task: d $ b
sh$ echo d
;task

Example run:

#580<5|>lilith:~/mess/2006/09$ ./fake.f a
echo c
c
echo d
d
echo b
b
echo a
a

casper_the_ghost · April 27, 2006, 9:01am

On 26 Apr 2006, at 19:26, Trans wrote:

being relegated to use in Rake only.
From the work I’ve been doing recently, I certainly think that this
kind of dependency is a general concept (and I agree that it feels
related to dependency injection, or what I know of it anyway. It also
feels related to event driven programming models and the publisher
subscriber pattern).

I’ve been looking at systems for automatic testing of other bits of
software. Within this space, you very often need to specify that
something happens after something else has happened (you need your
program running to test that it does something; you need to receive a
message before you can check it; you need to put things in to the
database before you can expect the program to do anything; you need a
database built before you can put things in to it; etc).

Openess seems to be very useful too. It’s useful to be able to leave
the implementation of a task open so that it can be extended
somewhere else. For instance, you may have a general idea of “prepare
the database”. For all tests, this would do basic set up (and may
require the database instance to be created). Some tests may add more
functionality in to this place holder for getting their particular
database requirements in place.

As another example, you may have the basic concept of receiving a
message. In some situations you may want to always do something extra
every time a message is received: log it binary as well as in human
readable form; or check it against an expected series of messages, as
examples.

A final concept is that tasks aren’t always singletons (in the design
pattern sense, rather than Ruby virtual class sense). You quite often
want to describe the properties of groups of tasks that can be
reused. This is like Rake’s pattern tasks I think (I’m sorry, I’m not
familiar with the terminology) that will turn (for instance) any ‘.c’
in to a ‘.o’. A general system would need a general means of
parameterising the tasks.

I’ve been jumping in and out of wanting to use this, but I just don’t
understand it well enough at the moment. Our project’s implementation
is quite event driven, but we certainly don’t have anything like a
framework for doing this. I’m sure there’s something there waiting to
be discovered, but I think it’s more sensible to iterate to it… I’m
convinced that the idea has a lot of general applications.

Oh, and one last thing… within testing particularly, it’s useful to
have tasks that know how to clear themselves up when they are no
longer needed. Perhaps this is a case of a task that (if run) will
require another clean up action to be run at some future time.

Sorry this is brain dumpy - it’s interesting to see that other people
are having similar ideas though, so perhaps it’ll spark something
somewhere

Cheers,
Benjohn

casper_the_ghost · April 27, 2006, 9:07am

On 27 Apr 2006, at 07:59, Benjohn B. wrote about using dependency
in testing applications and perhaps more widely:

snip

I forgot to mention that parallelism is really important - you often
need to have many things happening at the same time (mostly because
you’re not sure exactly what order a system will do several things
in, and you don’t want to deadlock by insisting on a particular
order). This kind of flow between tasks or states, seems like a
really good way of managing parallelism too. It’s almost as if you’re
specifying a parallel state machine.

Rake task dependeny vs. method call

== Rakefile

— task.rb

Store the dependencies.

Store the decriptions.

Set the description of the subsequent task.

Define a task.

Store the dependencies.

Store the decriptions.

Set the description of the subsequent task.

Define a task.

Tasks.new.d

task :b => :a do y “b” => uidgen(42) end
b: 15

a: 5

task :b => :a do y “b” => uidgen(42) end
b: 15

a: 5

harp:~ > ruby a.rb

a: 24

b: 25

Rake task dependeny vs. method call

== Rakefile

— task.rb

Store the dependencies.

Store the decriptions.

Set the description of the subsequent task.

Define a task.

Store the dependencies.

Store the decriptions.

Set the description of the subsequent task.

Define a task.

Tasks.new.d

task :b => :a do y “b” => uidgen(42) end b: 15

a: 5

task :b => :a do y “b” => uidgen(42) end b: 15

a: 5

harp:~ > ruby a.rb

a: 24

b: 25

task :b => :a do y “b” => uidgen(42) end
b: 15

task :b => :a do y “b” => uidgen(42) end
b: 15