Hi folks,
sorry to bother you with such a mundane question, but I tried for 3hours
and don`t know how
I need to extract the strings, that means everything between “” or ‘’
So, what approach or approaches did you try? What was the solution that
got closest to what you were trying, and what did it output?
There are a whole host of ways you might be approaching this. For
example, you might be using the Hpricot (HTML parsing) library, or REXML
(XML parsing), or simple regular expression matching.
If all you want is the bits between ‘single quotes’ then a regular
expression match is probably easiest. Try using String#scan, and give it
a regular expression which matches a single quote followed by any number
of non-single-quote characters followed by a single quote.
Or you could use use String#split("’") and keep only the odd-numbered
elements of the returned array.
I’m sure if you post your actual code and what it does, someone will
help you tweak it to work.
Disclaimer: If there are other ’ somewhere in the document (comments,
CDATA sections, Text elements)
this will miserably break and you need HPricot or other HTML parsers.
If however your data is simple enough and you can fulfill the
prerequisite it becomes very easy…