Hi everybody! Im sorry for asking a silly question .. i hope you
ll
find some time to assist
while searching shrough html body i get the string :
how can i modify it to get just a plain link without tags and params? :
https://sc.omniture.com/sc13_5/reports/chart.php?id=CPRIY_6NvJ_kj7s&s=www461.sj2&type=GIF
Vlad Smith wrote:
Hi everybody! Im sorry for asking a silly question .. i hope you
ll
find some time to assist
while searching shrough html body i get the string :
how can i modify it to get just a plain link without tags and params? :
https://sc.omniture.com/sc13_5/reports/chart.php?id=CPRIY_6NvJ_kj7s&s=www461.sj2&type=GIF
You’ll have to expand this to take care of all possible scenarios, but
here’s an example:
x = “<img src="http://some_url_to_scrape">”
y = x.scan(/src="([\S\s]+?)"/)
That will return an array, and you’ll have to fish the string out of the
array with y[0][0].
That works specifically with proper HTML using double quotes where your
examples above used malformed single quotes, but you can either use
multple expressions, or build a more complex one to cover the various
cases of quotes href, src, and other attributes names, etc.
There’s other ways to do it, this is just a small example to give you
some ideas.
– gw
On Thu, Jul 02, 2009 at 05:29:04AM +0900, Vlad Smith wrote:
array with y[0][0].
Thanks! that worked!
i also accidently noticed a great feature taken from perl that worked
also:
x = <img
src='https://sc.omniture.com/sc13_5/reports/chart.php?id=CPRIY_6NvJ_kj7s
s=www461.sj2&type=GIF’width=624 height=280 border=0
align=absmiddleusemap=#imIY_6NvJ_kj7s>
x = $1 if x =~ /.*(https.GIF)./
Please don’t do this. Every time you parse HTML with a regular
expression, a kitten dies.
Instead, try using an HTML parsing library:
x = <<-eohtml
eohtml
puts Nokogiri::HTML(x).at(‘img’)[‘src’]
Greg W. wrote:
Vlad Smith wrote:
Hi everybody! Im sorry for asking a silly question .. i hope you
ll
find some time to assist
while searching shrough html body i get the string :
how can i modify it to get just a plain link without tags and params? :
https://sc.omniture.com/sc13_5/reports/chart.php?id=CPRIY_6NvJ_kj7s&s=www461.sj2&type=GIF
You’ll have to expand this to take care of all possible scenarios, but
here’s an example:
x = “<img src="http://some_url_to_scrape">”
y = x.scan(/src="([\S\s]+?)"/)
That will return an array, and you’ll have to fish the string out of the
array with y[0][0].
That works specifically with proper HTML using double quotes where your
examples above used malformed single quotes, but you can either use
multple expressions, or build a more complex one to cover the various
cases of quotes href, src, and other attributes names, etc.
There’s other ways to do it, this is just a small example to give you
some ideas.
– gw
Thanks! that worked!
i also accidently noticed a great feature taken from perl that worked
also:
x = <img
src='https://sc.omniture.com/sc13_5/reports/chart.php?id=CPRIY_6NvJ_kj7s
s=www461.sj2&type=GIF’width=624 height=280 border=0
align=absmiddleusemap=#imIY_6NvJ_kj7s>
x = $1 if x =~ /.*(https.GIF)./