I’m pretty new to Ruby, and programming in general, and am having
massive trouble parsing some HTML pages I scraped from Yellow Pages.
So far, I’ve been using the link below as my template
I am trying to compile a list of restaurants in San Francisco, with the
price, ambiance and neighbourhood attributes. I want to import this list
into Excel. Does anyone have idea on how to adapt the script in the
template for YP?
I have successfully scraped the source code, but when it comes to
parsing, I’m having trouble inputting the right parameters.
Parsing html requires a good understanding of html structure, e.g.
parents, children, siblings, etc., and css, e.g. classes, ids, etc. As
a beginner it is better to take baby steps, not jump in the deep end of
the pool, so this project may be too hard for you.
I’m not at all clear what the specific things are that you want to
extract from the website.
In any case, you need to click on View/Source in your browser and
examine the raw html to figure out what tags you need to extract
(or attributes of the tags) and how to identify them. You can examine
the
web page in your browser then use Find or Search to locate the
same text in the raw html.
Then read some basic xpath tutorials starting here:
Here is an example of how to get the names of the restaurants:
Parsing html requires a good understanding of html structure, e.g.
parents, children, siblings, etc.; as well as css, e.g. classes, ids,
etc. As
a beginner it is better to take baby steps–not jump in the deep end of
the pool, so this project may be too hard for you.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.