Forum: Ruby Regular expression to parse out "host" part of URL string

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Wes G. (Guest)
on 2006-04-17 19:39
All,

I am trying to write a regex to parse out the "host" part of a potential
URL.

So if presented with

www.cnn.com

I want to get

www.cnn.com

If presented with

www.cnn.com/

I want to get

www.cnn.com

If presented with

www.cnn.com/some/other/stuff

I want to get

www.cnn.com

This regex:  /(.*)?\/*/  matches everything
This regex:  /(.*)?\// matches everything including the "/"

How can I just get the part of the string before the slash?

Thanks,
Wes
unknown (Guest)
on 2006-04-17 19:42
(Received via mailing list)
Hi --

On Tue, 18 Apr 2006, Wes G. wrote:

>
> If presented with
> How can I just get the part of the string before the slash?
How about:

   /[^\/]+/   # match one or more non-slash characters


David

--
David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale!  http://www.manning.com/black
Paper version coming in early May!
Wes G. (Guest)
on 2006-04-17 19:50
That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/"

though

?

unknown wrote:
> Hi --
>
> On Tue, 18 Apr 2006, Wes G. wrote:
>
>>
>> If presented with
>> How can I just get the part of the string before the slash?
> How about:
>
>    /[^\/]+/   # match one or more non-slash characters
>
>
> David
>
> --
> David A. Black (removed_email_address@domain.invalid)
> Ruby Power and Light, LLC (http://www.rubypowerandlight.com)
>
> "Ruby for Rails" PDF now on sale!  http://www.manning.com/black
> Paper version coming in early May!
Tom M. (Guest)
on 2006-04-17 20:41
(Received via mailing list)
Wes G. wrote:
> That works well.
>
> Can you explain to me why
>
> /(.*)?\//
>
> matched the text in front of the "/" AND the "/"  ?

The entire regexp matches everything up to the first slash *and* the
slash.  Recall, *this* is the regexp, and it matches a trailing slash:

     (.*)?\/

The first parenthesized group of the regex, however, matches only up to,
but not including, the trailing slash.  So, if the entire regex matches,
you want to get portion of the match that corresponds to the first
parenthesized group, which will be stored in $1:

irb(main):009:0* ("example.com/" =~ /(.*)?\//) && $1
=> "example.com"

Or, you can capture the regex-matching operation's result as a MatchData
object and query it to retrieve the desired portion:

irb(main):010:0> if matchdata = /(.*)?\//.match("example.com/")
irb(main):011:1>   matchdata[1]
irb(main):012:1> end
=> "example.com"

Or, you can use the zero-width positive lookahead regexp extension to
make sure the entire regexp matches only what you want.  Then you can
use the entire match as your result:

irb(main):024:0* if matchdata = /.*?(?=\/)/.match("example.com/")
irb(main):025:1>   matchdata.to_s
irb(main):026:1> end
=> "example.com"

Cheers,
Tom
unknown (Guest)
on 2006-04-23 01:00
(Received via mailing list)
Hi --

On Tue, 18 Apr 2006, Wes G. wrote:

>>    /[^\/]+/   # match one or more non-slash characters
>
> ?

Because you told it to :-)  You've got a / in the pattern.

(Keep in mind that the *whole* pattern matches, not just the part in
parentheses.  The parentheses are just for capturing submatches.)


David

--
David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale!  http://www.manning.com/black
Paper version coming in early May!
This topic is locked and can not be replied to.