I have a question about string search and replacement. Let’s say I have
this
string that contains 2 links for embedding youtube videos
amongst
some other random text.
string = “We’ve got some junk text here” +
“<object width=“425” height=“350”><param name=“movie”
value=“http://www.youtube.com/v/qqsyXdj_p_I”>” +
“<param name=“wmode” value=“transparent”><embed
src=” http://www.youtube.com/v/qqsyXdj_p_I"
type=“application/x-shockwave-flash”" +
“wmode=“transparent” width=“425”
height=“350”>” +
“And we’ve got some more junk text right here” +
“<object width=“425” height=“350”><param name=“movie”
value=“http://www.youtube.com/v/yYqACgndOQA”>” +
“<param name=“wmode” value=“transparent”><embed
src=” http://www.youtube.com/v/yYqACgndOQA"
type=“application/x-shockwave-flash”" +
“wmode=“transparent” width=“425”
height=“350”>” +
“more garbage text here”
And we've got some more junk text right here
more garbage text here
What would be the best library to use for parsing and replacing certain
values in a string? I’ve done simple .gsub’ing before, but this seems to
be
a little more complicated
Well, maybe you can assume valid XML and parse the page with REXML. If
it’s
not valid XML, well you can regex through for (off-the-cuff, probably
wrong)
/<param.value="(.)"/ and use the result. Someone more knowledgable in
Regexps can help you out if it comes to this.
I have a question about string search and replacement. Let’s say I have this
string that contains 2 links for embedding youtube videos amongst
some other random text.
I haven’t used it yet, but I hear really good things about Hpricot.
I have a question about string search and replacement. Let’s say I have
this string that contains 2 links for embedding youtube videos
amongst some other random text.
I haven’t used it yet, but I hear really good things about Hpricot.
i’ve used it[1], i needed to pull out some statistical data off a bunch
of
html pages slightly different from one to another, combined with
firebug[2]'s
ability to generate xpath expression by simply pointing at an
element[3], and
recent hpricot’s support for xpath indices… it should be a matter of
minutes
of automatically extracting anything you want from any html page.
Well, maybe you can assume valid XML and parse the page with REXML. If
it’s
not valid XML, well you can regex through for (off-the-cuff, probably
wrong)
/<param.value="(.)"/ and use the result. Someone more knowledgable in
Regexps can help you out if it comes to this.
Well it might be necessary to use a non greedy match
/<param.value="(.?)"/
in order not to consume a potentially following key="…" pair.
A more explicit and thus more readable way might be to write it like
this -
avoiding any potential backtracking issues if the regexp evolves later
too.
/<param.value="([^"])"/
This all is just for the quick hack though, definitely go with REXML or
hpricot if they can do the job for you.
Well it might be necessary to use a non greedy match
/<param.value="(.?)"/
in order not to consume a potentially following key="…" pair.
A more explicit and thus more readable way might be to write it like this -
avoiding any potential backtracking issues if the regexp evolves later too.
/<param.value="([^"])"/
Your advice of using a non greedy match is good, but the example using a
greedy match is not
your_re =~ ’ … bla bla bla … value=“ha!”’
puts $1
Greetings.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.