Parsing html files => putting them in fixtures for testing


#1

I’m using Hpricot parser to scrape web pages. I saved two of these
pages for a test in lack of a better way, I put the html files in the
fixtures like this:

dl_found_tickets:
html: “<%= File.read( ‘test/fixtures/html/
search_dl_found_tickets.html’ ).gsub(’”’, ‘"’) %>"
[…]

Even though the crawl class works fine, the test fails, so it’s got to
be something wrong with the fixture. I a test that compares the html
string loaded from the fixture and the one loaded from the fixture and
they’re not the same. From what I can tell, it’s only whitespace
difference, from some end-of-line conversions, I guees.

This test fails:
def test_html_fixtures
assert_equal File.read( ‘test/fixtures/html/
search_plate_found_ticket.html’ ).slice(0, 250), crawls
(:dl_found_tickets).html.slice(0, 250)
end

  1. Failure:
    test_html_fixtures(CrawlTest)
    [test/unit/crawl_test.rb:16:in test_html_fixtures' /usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/ active_support/testing/setup_and_teardown.rb:60:insend
    /usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/
    active_support/testing/setup_and_teardown.rb:60:in `run’]:
    <"\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r
    \n\r\n\r\n\r\n\r\n\r\n<link rel=“stylesheet” type=“text/
    css” h"> expected but was
    <" \n\n\n\n\n\n\n\n
.

#2

you are right. it seems like an error that occurrs after reading your
yml-file. obviously there are some additional whitespaces/carriage
returns added to it. but you could just gsub them.

but other than that, one question: why do you want to save a html
string in a yml-file (and not just read your html file whenever you
want to)?


#3

On 7 mar, 07:39, MaD removed_email_address@domain.invalid wrote:

you are right. it seems like an error that occurrs after reading your
yml-file. obviously there are some additional whitespaces/carriage
returns added to it. but you could just gsub them.

It may not be just the whitespace, because the parser gives different
results on the YML string and the File.read string. Isn’t there a
function to quote YML strings? I searched for it, and could not find
it.

but other than that, one question: why do you want to save a html
string in a yml-file (and not just read your html file whenever you
want to)?

I like it this way because I just load the object from the fixture in
my tests.