How do I parse a string to find a URL?

Is there a command in Ruby that will accept a string, and spit out a
URL that is contained in the string? I think I remember reading about
something that would do this, but I cant recall.

On 9/17/07, Jayson W. [email protected] wrote:

Is there a command in Ruby that will accept a string, and spit out a
URL that is contained in the string? I think I remember reading about
something that would do this, but I cant recall.

URI::extract

http://ruby-doc.org/core/classes/URI.html#M004839

Outstanding!
Thanks

Jano S. wrote:

On 9/17/07, Jayson W. [email protected] wrote:

Is there a command in Ruby that will accept a string, and spit out a
URL that is contained in the string? I think I remember reading about
something that would do this, but I cant recall.

URI::extract

http://ruby-doc.org/core/classes/URI.html#M004839

Wow, I didn’t know about that, very nice. But it has a few weaknesses:

URI.extract(“behold: www.abc.com and http://www.xyz.com.”)
=> [“behold:”, “http://www.xyz.com.”]
(notice the period at the end of xyz.com)

Daniel

On Sep 17, 7:46 pm, Daniel DeLorme [email protected] wrote:

Wow, I didn’t know about that, very nice. But it has a few weaknesses:

URI.extract(“behold:www.abc.comandhttp://www.xyz.com.”)
=> [“behold:”, “http://www.xyz.com.”]
(notice the period at the end of xyz.com)

Daniel

not a weakness, in that string ‘behold:’ is a valid uri, it has a
scheme with a scheme delimeter (“:”). “www.abc.com” is not an
unambiguous uri, no scheme present.

franco wrote:

not a weakness, in that string ‘behold:’ is a valid uri, it has a
scheme with a scheme delimeter (“:”). “www.abc.com” is not an
unambiguous uri, no scheme present.

Is it a valid uri if nothing is present after the scheme? Anyway, I know
that the results are technically valid but they are less than useful if
you want, say, to extract and “linkify” urls that users might have
written inside a message. (which is what I assumed the OP wanted but I
might have been mistaken)

Daniel

On Sep 17, 7:46 pm, Daniel DeLorme [email protected] wrote:

Wow, I didn’t know about that, very nice. But it has a few weaknesses:

URI.extract(“behold:www.abc.comandhttp://www.xyz.com.”)
=> [“behold:”, “http://www.xyz.com.”]
(notice the period at the end of xyz.com)

Daniel

also the period is legal,

On Sep 17, 10:06 pm, Daniel DeLorme [email protected] wrote:

Daniel

you could just select the ones with a scheme scpecific part? or screen
scrape for //a/@href to get all hyperreferenced anchors (links).