Question about ruby syntax

richardchaven · July 10, 2009, 8:03am

Hi

I’ve been reading about ruby and started to learn it to replace the work
I was doing with perl. I have a question about the code from this link.

http://www.adaruby.com/2008/01/11/scraping-gmail-with-mechanize-and-hpricot/

I don’t understand this code. I can see that it is a block that returns
array of entries containing the html for every tr with a white
background

#################################

page.search(“//tr[@bgcolor=‘#ffffff’]”) do |row|

from, subject = row.search(“//b/text()”)
url = page.uri.to_s.sub(/ui.$/,
row.search(“//a”).first.attributes[“href”])
puts “From: #{from}\nSubject: #{subject}\nLink: #{url}\n\n”

email = agent.get url

##################################

But what does the from, subject = *row.search(“//b/text()”) do?

How is the *row different than row?

Finally what does this do? I can see a regex but don’t understand the
line.

url = page.uri.to_s.sub(/ui.*$/,
row.search(“//a”).first.attributes[“href”])

Thanks in advance for your help.

Regards Richard

#################################### FULL CODE
#############################

require ‘rubygems’
require ‘mechanize’

agent = WWW::Mechanize.new

page = agent.get ‘http://www.gmail.com’
form = page.forms.first
form.Email = ‘your gmail account’
form.Passwd = ‘your password’
page = agent.submit form

page = agent.get
page.search(“//meta”).first.attributes[‘href’].gsub(/‘/,’‘)
page = agent.get page.uri.to_s.sub(/?.*$/, “?ui=html&zy=n”)
page.search("//tr[@bgcolor=’#ffffff’]“) do |row|
from, subject = row.search(“//b/text()”)
url = page.uri.to_s.sub(/ui.$/,
row.search(”//a").first.attributes[“href”])
puts “From: #{from}\nSubject: #{subject}\nLink: #{url}\n\n”

email = agent.get url

…

end

richardchaven · July 10, 2009, 12:57pm

row.search("//a").first.attributes[“href”])
puts “From: #{from}\nSubject: #{subject}\nLink: #{url}\n\n”

email = agent.get url

##################################

But what does the from, subject = *row.search("//b/text()") do?

The * performs what’s called a destructuring binding. This expression
takes the array returned by row.search("//b/text()") and assigns one
member to each variable listed on the left hand side. It’s equivalent
to:

temp = row.search("//b/text()")
from = temp[0]
subject = temp[1]

Finally what does this do? I can see a regex but don’t understand the

line.

url = page.uri.to_s.sub(/ui.*$/,
row.search("//a").first.attributes[“href”])

The expression row.search("//a").first.attributes["href"] finds the
first
element in the row and gets its href attribute, which I imagine
will
be a string containing a URL. So this is basically just:

url = page.uri.to_s.sub(/ui.*$/, “SOME_URL”)

So it takes the page’s URI, casts it to a string (to_s) and replaces the
pattern /ui.*$/ with the href from an anchor tag.