Forum: Ruby on Rails Parsing html files => putting them in fixtures for testing

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Constantin G. (Guest)
on 2009-03-07 14:32
(Received via mailing list)
I'm using Hpricot parser to scrape web pages. I saved two of these
pages for a test in lack of a better way, I put the html files in the
fixtures like this:

dl_found_tickets:
  html: "<%= File.read( 'test/fixtures/html/
search_dl_found_tickets.html' ).gsub('"', '\"') %>"
  [...]


Even though the crawl class works fine, the test fails, so it's got to
be something wrong with the fixture. I a test that compares the html
string loaded from the fixture and the one loaded from the fixture and
they're not the same. From what I can tell, it's only whitespace
difference, from some end-of-line conversions, I guees.

This test fails:
  def test_html_fixtures
    assert_equal File.read( 'test/fixtures/html/
search_plate_found_ticket.html' ).slice(0, 250), crawls
(:dl_found_tickets).html.slice(0, 250)
  end

  1) Failure:
test_html_fixtures(CrawlTest)
    [test/unit/crawl_test.rb:16:in `test_html_fixtures'
     /usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/
active_support/testing/setup_and_teardown.rb:60:in `__send__'
     /usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/
active_support/testing/setup_and_teardown.rb:60:in `run']:
<"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\">\r\n<!--Bean tags
and additional tags for use in this page.-->\r\n\r\n\r\n\r\n\r\n\r\n\r
\n\r\n\r\n<!--End of bean tags and additional tags for use in this
page.-->\r\n<html>\r\n<head>\r\n<link rel=\"stylesheet\" type=\"text/
css\" h"> expected but was
<"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\"> <!--Bean tags
and additional tags for use in this page.-->\n\n\n\n\n\n\n\n<!--End of
bean tags and additional tags for use in this page.--> <html> <head>
<link rel=\"stylesheet\" type=\"text/css\" href=\"css/style">.
MaD (Guest)
on 2009-03-07 15:40
(Received via mailing list)
you are right. it seems like an error that occurrs after reading your
yml-file. obviously there are some additional whitespaces/carriage
returns added to it. but you could just gsub them.

but other than that, one question: why do you want to save a html
string in a yml-file (and not just read your html file whenever you
want to)?
Constantin G. (Guest)
on 2009-03-07 18:45
(Received via mailing list)
On 7 mar, 07:39, MaD <removed_email_address@domain.invalid> wrote:
> you are right. it seems like an error that occurrs after reading your
> yml-file. obviously there are some additional whitespaces/carriage
> returns added to it. but you could just gsub them.

It may not be just the whitespace, because the parser gives different
results on the YML string and the File.read string. Isn't there a
function to quote YML strings? I searched for it, and could not find
it.

> but other than that, one question: why do you want to save a html
> string in a yml-file (and not just read your html file whenever you
> want to)?

I like it this way because I just load the object from the fixture in
my tests.
This topic is locked and can not be replied to.