Nokogiri/ruby and troublesome characters in url

I’m very new to using ruby, and I can’t seem to figure something out
(that is probably quite basic). Any help is much appreciated!

When using nokogiri and open-uri in Ruby, I define a variable containing
a partial url (INITIAL_URL =
https://zoek.officielebekendmakingen.nl/zoeken/resultaat/?zkt=Uitgebreid&pst=ParlementaireDocumenten”)
so as to be able to add onto the url for continuous use (I have added
the full code below).

However, I keep running into an error. “syntax error, unexpected tLABEL”

  • “unknown regexp options - zk” + "syntax error, unexpected ‘?’

How can I fix this?..

Here’s the full code:

irb
require ‘Nokogiri’
require ‘open-uri’

def get_search_result_links(n_page)

links = n_page.css(’.linker-kolom li a’)
puts “** There were #{links.length} links found”
links.each do |link|
href = link[‘href’]
inner_url = ‘https://zoek.officielebekendmakingen.nl’ + href
puts “\n\n\nFetching page at #{File.basename(inner_url).split(’?’)[0]}”

datalezer = open(inner_url).read
lokalenieuwefilenaam = href + “.html”
lokalenieuwefile = open(lokalenieuwefilenaam, “w”)
lokalenieuwefile.write(datalezer)
lokalenieuwefile.close
end
end

INITIAL_URL =
https://zoek.officielebekendmakingen.nl/zoeken/resultaat/?zkt=Uitgebreid&pst=ParlementaireDocumenten
initial_page = Nokogiri::HTML(open(INITIAL_URL))
pagination_links = initial_page.css(’.paginering.beneden a’)
last_page_link = pagination_links[-2]
last_page_number = last_page_link.text.to_i
(5…last_page_number).each do |page_num|
puts “\n\n\n***** Getting page #{page_num}”
results_page_url = “#{INITIAL_URL}&_page=#{page_num}”
results_page = Nokogiri::HTML(open(results_page_url))
get_search_result_links(results_page)
end

(In my setup) the line…

pagination_links = initial_page.css(’.paginering.beneden a’)

returns an empty Nokogiri::XML::NodeSet => []

What part of your html are you trying to select?

Something googled… http://ruby.bastardsbook.com/chapters/html-parsing/

Abinoam Jr.

Thanks for the reply Abinoam.

With pagination_links = initial_page.css(’.paginering.beneden a’) I’m
trying to recover

and then , which refer to all the page-links. So apparently something
is going wrong here aswell?..

The bigger problem I’m dealing with is that ruby believes that letters
following the question mark (in INITIAL_URL =
https://zoek.officielebekendmakingen.nl/zoeken/resultaat.?zkt=Uitgebreid&pst=ParlementaireDocumenten”)
should be interpreted as commands in stead of part of the entire string.
So I get an error when simply trying to define INITIAL_URL with a
url-string, because some of the characters in the url are interpreted as
commands.

Dear Sybren,

I’ve indented and fixed some quotes on your code.

https://gist.github.com/0d83e2487a0e955411d6

It runs, but there’s no “paginering beneden” on the html retrieved by
it.
So, the code fails at “pagination_links =
initial_page.css(’.paginering.beneden a’)”

Look:

initial_page.css(’.paginering’) => []
initial_page.css(’.beneden’) => []

But, as an example…
initial_page.css(’.tekst-kleiner’)
initial_page.css(‘a.tekst-kleiner’)
initial_page.css(‘header’).css(‘a.tekst-kleiner’)

all returns…
=> [#<Nokogiri::XML::Element:0x109e3d0 name=“a”
attributes=[#<Nokogiri::XML::Attr:0x109e358 name=“href”
value=“https://zoek.officielebekendmakingen.nl/zoeken/resultaat/?zkt=Uitgebreid&pst=ParlementaireDocumenten&grootte=2”>,
#<Nokogiri::XML::Attr:0x109e344 name=“class” value=“tekst-kleiner”>,
#<Nokogiri::XML::Attr:0x109e308 name=“title” value=“Schermteksten
verkleinen”>] children=[#<Nokogiri::XML::Text:0x10a2854 “”>]>]

Look the html source of your url and you will see it.

Best regards,
Abinoam Jr.

Are you testing your code by inserting an href by hand, something like
this:

inner_url = 'https://zoek.officielebekendmakingen.nl' + 

/something/zk?x=10&y=5

That produces the error:

unknown regexp options - zk

The reason for that error is that /something/ is the syntax for a regex
literal.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs