Detect any "<a href=mailto:...>...</a>" string in a string?

josh · October 7, 2009, 10:57pm

Hi all

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

or

or…

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate(“some”,“email.xx”,“Something”,“Some Email”)

or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot… But I don’t get any further now.

Any help is appreciated! Thanks!
Josh

josh · October 7, 2009, 11:24pm

On Wed, Oct 7, 2009 at 3:57 PM, Joshua M. [email protected] wrote:

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate(“some”,“email.xx”,“Something”,“Some Email”)

or something like that. Sadly I have no idea how to find the needed
string parts. I stumbled upon the TMail GEM and guess it could help me a
lot… But I don’t get any further now.

Here’s how I do it in PHP, if you wanna rework it into Ruby:

$GLOBALS[ ‘EMAIL_LINK_REGEX’ ] = “#<a[^>]mailto:([^'" ])['"
]>([^<]*)#i”;

$html = preg_replace_callback( $GLOBALS[ ‘EMAIL_LINK_REGEX’ ],
‘fubarEmail’, $html );

function fubarEmail( $matches )
{
$strNewAddress = replaceEntities( $matches[ 1 ] );

$strText = replaceEntities( $matches[ 2 ] );

$arrEmail = explode( ‘@’, $strNewAddress );

$strTag = “$arrEmail[0] at \r”;
$strTag .= str_replace( ‘.’, ’ dot ', $arrEmail[ 1 ] ) .
‘’;

return $strTag;
}

josh · October 8, 2009, 7:30am

Joshua M. wrote:

Hi all

I have a long, long string of HTML tags. There might be some unprotected
Email links in there like this:

Some Email

or

Some Email

or…

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate(“some”,“email.xx”,“Something”,“Some Email”)

or something like that. Sadly I have no idea how to find the needed
string parts.

1)regexes
2)gsub()
3)split()

html =<<ENDOFHTML

html page Some Email

hello

world

goodbye

Some Email ENDOFHTML

new_html = html.gsub(/(.+?)</a>/) do |match|
p match
addy = $1
link = $2
p addy, link

pieces = addy.split(“?”)
if pieces.length == 2
puts “there is a query string to parse”
name_vals = pieces[1].split(“&”)
p name_vals
end

puts

“the replacement string cobbled together from the pieces above”
end

puts new_html

–output:–
“<a href="mailto:[email protected]">Some Email”
“mailto:[email protected]”
“Some Email”

“<a href="mailto:[email protected]?subject=Something&cost=10">Some
Email”
“mailto:[email protected]?subject=Something&cost=10”
“Some Email”
there is a query string to parse
[“subject=Something”, “cost=10”]

html page the replacement string cobbled together from the pieces above

hello

world

goodbye

the replacement string cobbled together from the pieces above

josh · October 8, 2009, 8:07am

Joshua M. wrote:

I need to detect these email links and substitute them with something
different, like a JavaScript function to obfuscate them:

obfuscate(“some”,“email.xx”,“Something”,“Some Email”)

By the way, you can’t substitute js functions for tags.

josh · October 11, 2009, 12:16pm

Daniel Danopia wrote:

On Oct 7, 5:19ï¿½pm, Greg D. [email protected] wrote:

or…
Here’s how I do it in PHP, if you wanna rework it into Ruby:

ï¿½ $strTag .= “document.write(‘$arrEmail[1]">’);\r”;
Greg D.http://destiney.com/
You could also use a library such as hpricot or nokogiri to search and
replace all the tags.

And you should have \r\n, not \r, if you are writing HTML.

Thank you guys. Nokogiri looks really useful, I will take a look at it.

josh · October 8, 2009, 1:01pm

On Oct 7, 5:19 pm, Greg D. [email protected] wrote:

or…
Here’s how I do it in PHP, if you wanna rework it into Ruby:

$strTag .= “document.write(‘$arrEmail[1]">’);\r”;
Greg D.http://destiney.com/
You could also use a library such as hpricot or nokogiri to search and
replace all the tags.

And you should have \r\n, not \r, if you are writing HTML.