Automatically Determining "Requires" and "Provides" informat


#1

I’m looking to write a script that examines one or more scripts
written in Ruby and programatically determine the following:

  • What does each script provide? Specifically, what classes are
    defined, whether they are in a module namespace, etc.

  • What does the script require to operate (i.e. dependencies on other
    Ruby scripts)

I need this because I am looking into writing Ruby support into the
Conary packaging system (c.f. http://wiki.rpath.com/wiki/Conary).
Conary already automatically generates provides/requires information
when packaging scripts for other popular languages (e.g. Perl, Python,
Java), so I think that having Ruby support would be A Good Thing.

Ideally, I should be able to 100% accurately identify what the script
provides, and mostly identify requires for dependencies. I say
mostly because many scripts use tricks to load plugins that involve
using the module’s filename and path examined at runtime. The main
thing is that there are no “false provides/requires” returned by my
script.

I would like the solution to not require examining gemspecs, or even
require gems at all.

If anyone has any ideas about how to go about doing this, please
respond here. Let’s talk!

Thanks in advance,
Scott


#2

On Jan 19, 2007, at 9:57 AM, Scott Parkerson wrote:

Ideally, I should be able to 100% accurately identify what the script
provides, and mostly identify requires for dependencies.

I don’t think this is technically possible because of the dynamic
nature of Ruby. That is to say that I think the task you describe
is equivalent to the halting problem as it requires the ability to
divine the intent of executable code by simply examining the code
as opposed to running it and seeing what happens.

There are certainly heuristics you could use to get pretty close to
100% for ‘normal’ code (searching for ‘class X’ and for ‘require…’),
but you’ll never get to 100% and you’ll probably have to re-implement
a good portion of the Ruby parser in the process (i.e. to deal with
nested classes/modules). That may be good enough for your needs though.

Maybe there is something that already does that out there? Googling…

Check out
http://www.zenspider.com/ZSS/Products/ParseTree/index.html and
http://www.zenspider.com/ZSS/Products/ParseTree/Examples/
Dependencies.html
maybe that will give you some ideas.

Gary W.


#3

On 1/19/07, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

I don’t think this is technically possible because of the dynamic
nature of Ruby. That is to say that I think the task you describe
is equivalent to the halting problem as it requires the ability to
divine the intent of executable code by simply examining the code
as opposed to running it and seeing what happens.

I think you are right, so let me amend my request a bit.

  • I don’t need to get the exact signature of every method provided or
    required. File-based provides should be enough for what we need.
    Anything more complex can be synthesized manually. In most cases, Ruby
    provide would essentially be the filepath, with the $LOAD_PATH chopped
    off the front and the extension removed. Thus,
    /usr/lib/ruby/1.8/yaml.rb provides ‘yaml’.

  • Requires could be whatever was required at require time. I wrote a
    quick and dirty requires generator that essentially overrode
    Kernel.require to stuff the argument to require into a Set. Thus, for
    yaml:

/usr/lib/ruby/1.8/yaml.rb requires the following modules:
“yaml/constants”
“yaml/ypath”
“yaml/error”
“yaml/rubytypes”
“stringio”
“date”
“rational”
“syck”
“yaml/syck”
“yaml/basenode”
“date/format”
“yaml/tag”
“yaml/stream”
“yaml/types”

The big question is whether Ruby C extensions are always required by
filename (i.e. if you have a C extension called big/foo.so, require
‘big/foo’ will load it). In Python, this is tricky, as the shared
library name may not be the thing you use with import at all.

The bottom line is to have a “good enough” provides/requires mechanism
that automates packaging information. It’s obviously not perfect, as
two foo.rb’s might do wildly different things.

Here’s the code snippet, so far (quick and dirty is probably a vast
understatement):

require ‘set’
$required = Set.new

module Kernel
alias_method :old_require, :require
def require(m)
begin
result = old_require(m)
rescue LoadError => blargh
print “warning: #{blargh}\n”
rescue NameError
true
end
$required = $required.add(m)
result
end
end

require(ARGV[0])

print “#{ARGV[0]} requires the following modules:\n”
$required.each { |file| p file }


#4

Hello,

On Fri, 19 Jan 2007 23:57:54 +0900, Scott Parkerson
removed_email_address@domain.invalid wrote:

I’m looking to write a script that examines one or more scripts
written in Ruby and programatically determine the following:

  • What does the script require to operate (i.e. dependencies on other
    Ruby scripts)

I have a tool in the kwala project on rubyforge that attempts to
determine this. It uses the Java prefuse library to display a dynamic
graph for inspection, or if you don’t want the Java dependency you can
have it output a static graphviz graph. It also does a few other
things like find require cycles. If you are interested take a look in
the cycle_detector.rb file.

You can find it here:
http://kwala.rubyforge.org/

I hope that helps,
Zev