Hi Brian Thanks for sharing your thoughts! Since much of what you note
is commonly accepted about Ruby, I am happy that my research is
subtle enough to warrant such discussion (and interesting enough to get
an e-mail or two!)
If you don’t mind, I’d like to write a blog post sharing your concerns
(anonymized, naturally) and my responses. Would that be okay?
Points I’d raise:
- In my experience, very little real-world Ruby code uses
‘block_given?’. If it needs to yield, it just yields. I’d consider this
to be a case of duck-typing.
This seems to suggest Rubyists rarely write methods that take blocks
optionally. Of this, I am highly skeptical. Luckily, doing this work in
my
thesis will allow me to study statistics of how block_given? is used.
With yield you get a run-time error if no block was passed, but that’s
only one of a much larger set of method call errors (such as calling a
method with argument of the wrong type).
Correct, but my work intends to show that improper block use, when using
yield,
is a far more easily determined method call error in Ruby than a type
error.
What makes this research fascinating is that Ruby is rich enough to
allow for
such nuance! For a typical language which permits closures as arguments,
one must use careful alias analysis and escape analysis. yield
as
syntactic
sugar makes it a much simpler case to analyze, which is why I tackled it
first.
Consider also that very little code tests ‘a.respond_to? :foo’ before
calling ‘a.foo’.
This does not reflect the intent of this analysis - please see below.
- If a method uses &blk or Proc.new or yield, I’d say it’s fairly safe
to assume that the block may be called (at least from the point of
view of automated documentation). Since it’s unprovable in general even
whether the method returns or not, it seems like hard work (for little
benefit) to try to decide whether
Nearly everything about a program is undecidable to determine in
practice - see Rice’s Theorem. [1] Luckily, compiler writers and PL
theorists
have been studying forms of analysis for decades to try to get around
this and discover the patterns that we know can be analyzed. To address
your example, of course termination is unprovable, but the class
of functions for which termination is provable includes many, many
real-world functions. [2] [3]
a method which accepts a block never actually calls it.
Here’s why this issue is worth tackling: ALL methods accept a block,
and no matter how trivial, no tools will tell you that passing a block
to
that method was foolish, let alone statically:
2.+(4) { |x, y| x ** y } #=> 6
Additionally, no tool can tell you that a block is required by a
method,
even if it is obvious:
No tool currently documents that a block is required here
def tap
yield self
end
My work does not try to determine each and every case which
triggers a yield, but merely to develop a coarse classification system
for
a method based on its overall approach to blocks: required, optional,
or ignored. As I showed in my blog post (and as I will prove in my
Thesis),
this classification can be determined precisely when the result of
block_given?
is stored only in simple constants (this includes
temporaries)
when yield
is used.
If one peruses the Ruby standard library, one will find that just in the
Ruby
code alone, block_given? occurs 265 times, in every single case is
used
to execute yield conditionally, and in every single case, the result is
used
only as a simple constant. [4]
- As you’re undoubtedly aware, Ruby is so dynamic that you can’t
analyse a method in isolation anyway. You can decide that a bareword
like ‘foo’ is a method call, but you don’t know what that method will
actually do when the program is run - it could be redefined dynamically,
either within a class or on single objects (in their singleton class).
Yes, this is one of the difficulties inherent in statically analyzing a
dynamic
language. Luckily, Laser does not analyze single methods, it works on
a set of input files and traverses requires/loads by using constant
propagation to
handle changes to $LOAD_PATH and $LOADED_FEATURES. As you note,
a nave approach doesn’t work, and having access to all input files is
very
important. There is code that will be very hard to handle: see
SortedSet.setup’s
code as an example for which I haven’t figured out an approach just yet.
Dynamic method creation is, in my opinion, what challenges static
analysis
in Ruby the most. Naturally, in the general case, it makes all analysis
impossible.
What tool could figure out much about a program containing this code?
def Object.inherited(klass)
def klass.inherited(some_class)
some_class.class_eval(gets)
end
klass.class_eval(gets)
end
My belief, whose validity my research hopes to support (but may
ultimately
reject, or somewhere in the middle) is that such pathological code is
less
of an issue in real-world application code. I do not expect a library
like
RSpec, whose internals are full of dynamic magic, to get as much out of
my research. This is the biggest challenge ahead of me. Luckily,
existing
work has seen success analyzing real-world code without even touching
on this issue. [5]
Thanks again for your interest! I hope my work continues to interest you
as I continue over the coming months.
References (sorry, I’ve only got Bibtex for some of these for now):
[1] http://en.wikipedia.org/wiki/Rice’s_theorem
[2] @article{cook2006termination,
title={{Termination proofs for systems code}},
author={Cook, B. and Podelski, A. and Rybalchenko, A.},
journal={ACM SIGPLAN Notices},
volume={41},
number={6},
pages={415–426},
issn={0362-1340},
year={2006},
publisher={ACM}
}
[3] @article{andreas6terminator,
title={{Terminator: Beyond safety}},
author={Andreas, R.C. and Cook, B. and Podelski, A. and Rybalchenko,
A.},
journal={In CAV06, LNCS},
volume={4144},
pages={415–418}
}
[4] ack --ruby -c “block_given\?” | grep -e ‘:[^0]$’ | cut -d’:’ -f2 |
awk ‘{s+=$1} END {print s}’
gives the quantity, and using a context-ful grep is enough to see the
usage patterns of
each call. Almost every single call lies in an “if” or “unless”
condition, or the condition of
the ternary operator, and the result is not stored to a variable.
lib/time.rb:264 has an example
justifying my analysis of where block_given? is called once, its result
stored in a variable,
and then that variable is used as a constant to conditionally yield.
[5] @article{ecstatic,
title={{Ecstatic–Type Inference for Ruby Using the Cartesian Product
Algorithm}},
author={Kristensen, K.},
journal={Master’s thesis, Aalborg University},
year={2007}
}
Michael E.
[email protected]
http://carboni.ca/