Regular expression to parse out "host" part of URL string


#1

All,

I am trying to write a regex to parse out the “host” part of a potential
URL.

So if presented with

I want to get

If presented with

I want to get

If presented with

www.cnn.com/some/other/stuff

I want to get

This regex: /(.)?// matches everything
This regex: /(.*)?// matches everything including the “/”

How can I just get the part of the string before the slash?

Thanks,
Wes


#2

Hi –

On Tue, 18 Apr 2006, Wes G. wrote:

If presented with
How can I just get the part of the string before the slash?
How about:

/[^/]+/ # match one or more non-slash characters

David


David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

“Ruby for Rails” PDF now on sale! http://www.manning.com/black
Paper version coming in early May!


#3

That works well.

Can you explain to me why

/(.*)?//

matched the text in front of the “/” AND the “/”

though

?

unknown wrote:

Hi –

On Tue, 18 Apr 2006, Wes G. wrote:

If presented with
How can I just get the part of the string before the slash?
How about:

/[^/]+/ # match one or more non-slash characters

David


David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

“Ruby for Rails” PDF now on sale! http://www.manning.com/black
Paper version coming in early May!


#4

Wes G. wrote:

That works well.

Can you explain to me why

/(.*)?//

matched the text in front of the “/” AND the “/” ?

The entire regexp matches everything up to the first slash and the
slash. Recall, this is the regexp, and it matches a trailing slash:

 (.*)?\/

The first parenthesized group of the regex, however, matches only up to,
but not including, the trailing slash. So, if the entire regex matches,
you want to get portion of the match that corresponds to the first
parenthesized group, which will be stored in $1:

irb(main):009:0* (“example.com/” =~ /(.*)?//) && $1
=> “example.com

Or, you can capture the regex-matching operation’s result as a MatchData
object and query it to retrieve the desired portion:

irb(main):010:0> if matchdata = /(.*)?//.match(“example.com/”)
irb(main):011:1> matchdata[1]
irb(main):012:1> end
=> “example.com

Or, you can use the zero-width positive lookahead regexp extension to
make sure the entire regexp matches only what you want. Then you can
use the entire match as your result:

irb(main):024:0* if matchdata = /.*?(?=/)/.match(“example.com/”)
irb(main):025:1> matchdata.to_s
irb(main):026:1> end
=> “example.com

Cheers,
Tom


#5

Hi –

On Tue, 18 Apr 2006, Wes G. wrote:

/[^/]+/ # match one or more non-slash characters

?

Because you told it to :slight_smile: You’ve got a / in the pattern.

(Keep in mind that the whole pattern matches, not just the part in
parentheses. The parentheses are just for capturing submatches.)

David


David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

“Ruby for Rails” PDF now on sale! http://www.manning.com/black
Paper version coming in early May!