Help with Ruby return types and documentation

Hi guys,

I’m new to Ruby and I really like it…or trying to like it.

It’s so versatile and powerful and I’d like to take advantage of that
but there are things that are just pissing me off.

Probably due to my inexperience with it and this is the reason I’m
writing this e-mail.

I’m trying to write a web spider using Anemone and analyze the HTML with
Nokogiri to extract get,post and cookie parameters.

But I’m stumped with Nokogiri’s API.

I can’t figure anything out, I don’t know what are the available methods
or attributes of a doc object, can’t figure out how to get html “type”,
“name” and “value” fields.

Usually Eclipse is good with presenting documentation when needed but
the Ruby plug-in or RDoc don’t really help.

I’m all for dynamic typing but this is getting out of hand.
Please tell me that I’m missing something…and what it is that I’m
missing of course.

In addition, any code examples that will help me out will be very
appreciated.

Cheers,
Tasos.

2010/6/22 Tasos L. [email protected]:

I’m trying to write a web spider using Anemone and analyze the HTML with

I’m all for dynamic typing but this is getting out of hand.
Please tell me that I’m missing something…and what it is that I’m
missing of course.

In addition, any code examples that will help me out will be very
appreciated.

Did you enter “nokogiri” or “nokogiri documentation” into your
favorite search engine? Did you look at http://nokogiri.org and
especially http://nokogiri.org/tutorials ? You can also use IRB to
investigate what methods are available for a class or instance.

Relax

robert

On Tue, Jun 22, 2010 at 7:57 AM, Tasos L. [email protected]
wrote:

But I’m stumped with Nokogiri’s API.

I can’t figure anything out, I don’t know what are the available methods
or attributes of a doc object, can’t figure out how to get html “type”,
“name” and “value” fields.

Using irb to explore might help, e.g.
doc = something something
doc.methods

But since you’re apparently interested in form fields – and to me, the
doc is pretty obvious about this – take a trivial example where ‘doc’
is a page with a form in it:

inputs = doc.search(‘//input’)
=> [#<Nokogiri::XML::Element:0x80de9c04 name=“input”
attributes=[#<Nokogiri::XML::Attr:0x80de9ab0 name=“type”
value=“text”>, #<Nokogiri::XML::Attr:0x80de9a9c name=“name”
value=“hsTotal”>, #<Nokogiri::XML::Attr:0x80de9a88 name=“id”
value=“hsTotal”>]>]
inputs[0]
=> #<Nokogiri::XML::Element:0x80de9c04 name=“input”
attributes=[#<Nokogiri::XML::Attr:0x80de9ab0 name=“type”
value=“text”>, #<Nokogiri::XML::Attr:0x80de9a9c name=“name”
value=“hsTotal”>, #<Nokogiri::XML::Attr:0x80de9a88 name=“id”
value=“hsTotal”>]>

By just looking at the above, I see this object has “attributes”, so

inputs[0].attributes[‘type’].to_s
=> “text”
inputs[0].attributes[‘name’].to_s
=> “hsTotal”

And so on…

HTH!

Robert K. wrote:

2010/6/22 Tasos L. [email protected]:

I’m trying to write a web spider using Anemone and analyze the HTML with

I’m all for dynamic typing but this is getting out of hand.
Please tell me that I’m missing something…and what it is that I’m
missing of course.

In addition, any code examples that will help me out will be very
appreciated.

Did you enter “nokogiri” or “nokogiri documentation” into your
favorite search engine? Did you look at http://nokogiri.org and
especially http://nokogiri.org/tutorials ? You can also use IRB to
investigate what methods are available for a class or instance.

Relax

robert

Erm…I think I might have come off a bit too strong, there’s no smoke
coming out of my ears or anything I was speaking/writing figuratively.

And of course I did all those things, I’m not an imbecile and I’m also
not bashing Ruby.

I’d really like not to use IRB just to figure out an inheritance
hierarchy or the available method/attributes of an object.

@Hassan, thanks that really helped me out.
As I said I just started getting into Ruby so things that look obvious
to you can easily elude me for the time being.

Richard C. wrote:

On Tue, Jun 22, 2010 at 4:59 PM, Tasos L. [email protected]
wrote:

And of course I did all those things, I’m not an imbecile and I’m also
not bashing Ruby.

I’d really like not to use IRB just to figure out an inheritance
hierarchy or the available method/attributes of an object.

Hi Tasos,
I came from Java like yourself, so the absence of return type
documentation really
warped my head too, to start with. You really can’t expect Ruby authors
to
document
return types like they do in Java - a Ruby method can return anything
from a
method.
In practice API writers are unlikely to be jerks about it - returned
objects
tend to have
a lot of consistency.

While you may not like the IRB option, in practice it is your fastest
path
to understanding.
However you should really take a look at awesome_print:

awesome_print: A New Pretty Printer for your Ruby Objects

Its a Ruby gem that puts pp (pretty print) on steroids, giving you a lot
of
information about
the objects, including its inheritance hierarchy, and the tree structure
of
any Collections.

There isn’t really a shortcut to learning Ruby libraries. Most don’t
play
mindgames with
the return types. However HTML inspection APIs like nokogiri will be
returning objects
that represent complex HTML structures. I remember Hpricot being much
the
same.

Its worth taking the library through IRB (and ap) for 30 minutes or so
with
some test data,
to get a feel for it.

regards,
Richard.

Hi Richard,

Thanks a lot for your informative reply.

You missed the one about coming from Java --not a big fun-- but I do
appreciate its documentation.

My background is in C and PHP.
PHP is dynamically typed too but the documentation is pretty clear about
types and examples.

First day of Ruby today so I guess a working spider and half working
analyzer isn’t too bad of a progress.

I just have to get used to doing things the Ruby way, which I will
eventually…unfortunately patience is not one of my qualities.

What might have sounded like hostility is actually eagerness. :slight_smile:

Funny you should mention pretty_print and the like, I was just reading
about it.
That should be a great help for me.

Time to setup awesome_print and get cracking with IRB.

Thanks again. :slight_smile:

On Tue, Jun 22, 2010 at 7:03 PM, Tasos L. [email protected]
wrote:

Hi Richard,

Thanks a lot for your informative reply.

You missed the one about coming from Java --not a big fun-- but I do
appreciate its documentation.

Good API docs was always a quality of Java, even from the early days,
but
Java needs it. Java libraries are very class heavy. The dynamic nature
of
Ruby, coupled with powerful language features, means that you can do a
lot more with less classes and methods. Its a big leap to make though.

My background is in C and PHP.
PHP is dynamically typed too but the documentation is pretty clear about
types and examples.

I don’t have much experience with PHP, but from the opinions of those
who
do,
I think a lot of PHP development is isolated within its standard
library. In
Ruby
a lot of development uses a lot of 3rd party libraries/gems. I would say
the
usage
of non-standard libraries in Ruby is above the median across all
programming
languages. Rails, which is about as conventional as Ruby development
gets,
relies a lot on its extension mechanism. It would be pretty unusual to
see
Rails projects that scorned all plugins. For non-Rails projects, nearly
anything goes,
and you can expect to see real world projects assembled almost entirely
out
of
3rd party libraries that were not designed with each other in mind
(Sinatra

  • Haml
  • DataMapper is actually a common enough combo).

This fact doesn’t help the documentation situation much. And the APIs of
the
best
stuff, do tend to be a bit of a moving target. There is also a bit of
competition in
significant problem domains (like web development, testing libraries,
HTTP
clients
etc.). All of this means that documentation is not as verbose or
complete as
you
would like it to be. Understanding an API is usually best accomplished
by
running it
through IRB, studying source, or running the unit tests. All of which
sounds
cavalier if you are used to working in languages/platforms where there
are
significant
vendors.

First day of Ruby today so I guess a working spider and half working
analyzer isn’t too bad of a progress.

Yeah. Not too shabby.

I just have to get used to doing things the Ruby way, which I will
eventually…unfortunately patience is not one of my qualities.

Well discipline is a more important virtue. There are a lot of very cool
libraries out there,
and its easy to get side tracked into them. Personally I have to be very
strict about the
amount of libraries I wish to learn at once (ideally just one) and
ignore a
lot of cool things.

On Tue, Jun 22, 2010 at 4:59 PM, Tasos L. [email protected]
wrote:

And of course I did all those things, I’m not an imbecile and I’m also
not bashing Ruby.

I’d really like not to use IRB just to figure out an inheritance
hierarchy or the available method/attributes of an object.

Hi Tasos,
I came from Java like yourself, so the absence of return type
documentation really
warped my head too, to start with. You really can’t expect Ruby authors
to
document
return types like they do in Java - a Ruby method can return anything
from a
method.
In practice API writers are unlikely to be jerks about it - returned
objects
tend to have
a lot of consistency.

While you may not like the IRB option, in practice it is your fastest
path
to understanding.
However you should really take a look at awesome_print:

http://www.rubyinside.com/awesome_print-a-new-pretty-printer-for-your-ruby-objects-3208.html

Its a Ruby gem that puts pp (pretty print) on steroids, giving you a lot
of
information about
the objects, including its inheritance hierarchy, and the tree structure
of
any Collections.

There isn’t really a shortcut to learning Ruby libraries. Most don’t
play
mindgames with
the return types. However HTML inspection APIs like nokogiri will be
returning objects
that represent complex HTML structures. I remember Hpricot being much
the
same.

Its worth taking the library through IRB (and ap) for 30 minutes or so
with
some test data,
to get a feel for it.

regards,
Richard.