Gsub + loop

This question actually pertains to a Rails app but it’s more of a
general Ruby question so I’ll ask it here.

Within a body of text I’m trying to match URLs for
youtube/google/myspace/etc videos and replace them with their associated
embed codes. For each different site there is a different regular
expression and a different embed code. So I made a hash with each value
being an array containing the regular expression and replacement like:
{:videosite => [regexp, replacement]}. From there I figured I should
loop through the hash and check the text against each expression and
replace the URLs if necessary. Unfortunately I am having a problem:

http://pastie.caboo.se/64785

class Embedment < ActiveRecord::Base
def self.capture_embedments(text)
embedments = []
regexps.each do |k,v|
text.gsub!(v[0]) do |match|
embedments << embedment = self.new
embedment.html = v[1]
end
end

return embedments

end

def self.regexps
return {
:youtube => [/(youtube:.?(?:v=)?([\w|-]{11}).)/, “<object
width="425" height="350"><param name="movie"
value="YouTube”><param
name="wmode" value="transparent"><embed
src="YouTube"
type="application/x-shockwave-flash" wmode="transparent"
width="425" height="350">"]
}
end
end

I think the flaw in my plan is that the second member of the array is
being parsed as soon as it’s referenced, so this raises an exception
(undefined local variable or method `match’ for Embedment:Class) since
the object ‘match’ does not exist yet. I’m guessing my approach to this
problem is very very wrong, but I have yet to see past my poor solution.
The reason why I separated the regular expressions from the method is
because there’s going to be a few tens of them and I wanted to
consolidate them. Any help would be appreciated.

On 26.05.2007 09:44, Eleo wrote:

replace the URLs if necessary. Unfortunately I am having a problem:
end
name=“wmode” value=“transparent”><embed
the object ‘match’ does not exist yet.
You’re right on!

I’m guessing my approach to this
problem is very very wrong, but I have yet to see past my poor solution.
The reason why I separated the regular expressions from the method is
because there’s going to be a few tens of them and I wanted to
consolidate them. Any help would be appreciated.

You are not too far away. Just use proper regexp escapes in the
replacement string plus apply the replacement twice. So #{match[0]}
becomes \& and #{match[1]} becomes \1 etc.

class Embedment < ActiveRecord::Base
def self.capture_embedments(text)
embedments = []
REGEXPS.each do |k,v|
text.gsub!(v[0]) do |match|
embedment = new
embedments << embedment
embedment.html = match.sub(*v)
end
end

 return embedments

end
end

Note also, that it’s better to make the replacements a constant in the
class. And if you iterate only, you don’t need symbols as keys, just do

REGEXPS = {
/(youtube:.?(?:v=)?([\w|-]{11}).)/ => “…”,
}

And then

 REGEXPS.each do |rx,repl|
   text.gsub!(rx) do |match|
     embedment = new
     embedments << embedment
     embedment.html = match.sub(rx,repl)
   end
 end

If you need more flexibility you can replace the replacement string with
a block. Then you can do

class Embedment
REGEXPS = {
/(youtube:.?(?:v=)?([\w|-]{11}).)/ =>
lambda {|match| “” }
}
end

Then you can do

 REGEXPS.each do |rx,repl|
   text.gsub!(rx) do |match|
     embedment = new
     embedments << embedment
     embedment.html = match.sub(rx,&repl)
   end
 end

Kind regards

robert

While not related to using gsub/regular expressions, consider an
option like hpricot?

Robert K. wrote:

some helpful crap
Thanks, this seems to work out fine. I used hash keys just in case. It
is easier to type a site name than it is to type out a regular
expression, and although I don’t need to access the hash directly yet,
well, you nevva know.

As for lambda, something I don’t fully understand yet. While I was
pondering the solution I kind of had this weird feeling that it might be
the solution, though. I’ll look into it.

Paul S. wrote:

While not related to using gsub/regular expressions, consider an
option like hpricot?

I just glanced at it, but I’m not sure why it would be a better option.
In my case I wanted users to be able to embed videos in their comments,
but they have no access to html, so I instead created this generic
markup like (youtube:link) to accomplish the same ends. I figured
allowing them the option of using real HTML code would be too
risky. I don’t know for sure whether or not there are malicious uses
for but I imagined so.