Nokogiri/ruby and troublesome characters in url

I’m very new to using ruby, and I can’t seem to figure something out
(that is probably quite basic). Any help is much appreciated!

When using nokogiri and open-uri in Ruby, I define a variable containing
a partial url (INITIAL_URL =”)
so as to be able to add onto the url for continuous use (I have added
the full code below).

However, I keep running into an error. “syntax error, unexpected tLABEL”

  • “unknown regexp options - zk” + "syntax error, unexpected ‘?’

How can I fix this?..

Here’s the full code:

require ‘Nokogiri’
require ‘open-uri’

def get_search_result_links(n_page)

links = n_page.css(’.linker-kolom li a’)
puts “** There were #{links.length} links found”
links.each do |link|
href = link[‘href’]
inner_url = ‘’ + href
puts “\n\n\nFetching page at #{File.basename(inner_url).split(’?’)[0]}”

datalezer = open(inner_url).read
lokalenieuwefilenaam = href + “.html”
lokalenieuwefile = open(lokalenieuwefilenaam, “w”)

initial_page = Nokogiri::HTML(open(INITIAL_URL))
pagination_links = initial_page.css(’.paginering.beneden a’)
last_page_link = pagination_links[-2]
last_page_number = last_page_link.text.to_i
(5…last_page_number).each do |page_num|
puts “\n\n\n***** Getting page #{page_num}”
results_page_url = “#{INITIAL_URL}&_page=#{page_num}”
results_page = Nokogiri::HTML(open(results_page_url))

(In my setup) the line…

pagination_links = initial_page.css(’.paginering.beneden a’)

returns an empty Nokogiri::XML::NodeSet => []

What part of your html are you trying to select?

Something googled…

Abinoam Jr.

Thanks for the reply Abinoam.

With pagination_links = initial_page.css(’.paginering.beneden a’) I’m
trying to recover

and then , which refer to all the page-links. So apparently something
is going wrong here aswell?..

The bigger problem I’m dealing with is that ruby believes that letters
following the question mark (in INITIAL_URL =”)
should be interpreted as commands in stead of part of the entire string.
So I get an error when simply trying to define INITIAL_URL with a
url-string, because some of the characters in the url are interpreted as

Dear Sybren,

I’ve indented and fixed some quotes on your code.

It runs, but there’s no “paginering beneden” on the html retrieved by
So, the code fails at “pagination_links =
initial_page.css(’.paginering.beneden a’)”


initial_page.css(’.paginering’) => []
initial_page.css(’.beneden’) => []

But, as an example…

all returns…
=> [#<Nokogiri::XML::Element:0x109e3d0 name=“a”
attributes=[#<Nokogiri::XML::Attr:0x109e358 name=“href”
#<Nokogiri::XML::Attr:0x109e344 name=“class” value=“tekst-kleiner”>,
#<Nokogiri::XML::Attr:0x109e308 name=“title” value=“Schermteksten
verkleinen”>] children=[#<Nokogiri::XML::Text:0x10a2854 “”>]>]

Look the html source of your url and you will see it.

Best regards,
Abinoam Jr.

Are you testing your code by inserting an href by hand, something like

inner_url = '' + 


That produces the error:

unknown regexp options - zk

The reason for that error is that /something/ is the syntax for a regex

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs