Mechanize out of buffer space

I am trying to scrape a site and then its children to get data I relate
in tables, the only problems is that I keep getting an “OUT OF BUFFER
SPACE” error. Is there a way to clear the buffer after each iteration
or am I doing something wrong?

Here’s the code:
require ‘rubygems’
require ‘mechanize’
require ‘active_record’

ActiveRecord::Base.establish_connection(
#connection goes here
)

class Major < ActiveRecord::Base
has_many :courses
end

class Course < ActiveRecord::Base
belongs_to :major
end

class Sections
def scrape(url)
agent = WWW::Mechanize.new
page = agent.get(url)
table = (page/’//table’)[6]
(table/“tr”).each do |major|
@newMajor = Major.new
@newMajor.title = (major/’//td’).first.inner_html
@newMajor.abbrev = (major/‘acronym’).inner_html
@newMajor.link_to = (major/‘a’).to_s.split(’"’)[1]
puts title,abbrev,link_to
end
end
end

class Classes
attr_writer :major_id
def scrape(url)
agent = WWW::Mechanize.new
page = agent.get(“http://courses.tamu.edu/”+url.to_s)
(page/"//td[@class=‘sectionheading’]").each do |course|
course = course.inner_html.strip.split(’ ‘)
course.pop
@newCourse = Course.new
@newCourse.major_id = @major_id
@newCourse.course_no = course[1]
@newCourse.name = course.slice!(3,course.length).join(’ ')
@newCourse.save
end
end
end

AllMajors = Major.find(:all)
AllMajors.each do |course|
start = Time.now
newClass = Classes.new
newClass.major_id = course.id
newClass.scrape(course.link_to)
puts “Added courses for #{course.title}”
finish = Time.now
puts “Took #{finish-start} seconds”
end
puts “Finished scraping courses”

After having to delve into the actual Hpricot source it turns out
there’s a predefined buffer size and you can’t change it without
actually editing the source and recompiling.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs