Getting a valid URL from a command line

Hunt_J · September 28, 2009, 5:48am

Hi - I’m working on the script below, which attempts at getting
a user input and validate that the input is formed like a URL.
And if the user fails to input, it should ask again.

require ‘uri’
puts “Type a URL”
begin
url = gets.chomp
URI.parse(url) # should raise if a variable ‘url’ is malformed.
rescue URI::InvalidURIError
puts “That is not a valid URL. Try again.”
retry
end

I expect that if I run “URI.parse()” it should raise an error, but
it doesn’t happen.

Can anybody help me on this one?

Jon

Hunt_J · September 28, 2009, 6:11am

On Sep 27, 2009, at 11:48 PM, Hunt J. wrote:

puts “That is not a valid URL. Try again.”
retry
end

I expect that if I run “URI.parse()” it should raise an error, but
it doesn’t happen.

Can anybody help me on this one?

Jon

require ‘uri’
print "Type a URL: "
begin
url = gets.chomp
puts “You said: #{url.inspect}”
uri = URI.parse(url) # should raise if a variable ‘url’ is malformed.
puts uri.inspect
rescue URI::InvalidURIError
puts “That is not a valid URL. Try again.”
retry
end

Try getting a little bit more information out (and post what input you
are trying that you expect to be malformed).

Note that some URI’s are HTTP and some might be Generic. There are a
lot more types of URI that just those that start with http://. Have
you ever seen a jdbc resource string?

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

Hunt_J · September 28, 2009, 6:46am

On Mon, Sep 28, 2009 at 1:11 PM, Rob B.
[email protected] wrote:

URI.parse(url) # should raise if a variable ‘url’ is malformed.
Jon
retry
end

Try getting a little bit more information out (and post what input you are
trying that you expect to be malformed).

Note that some URI’s are HTTP and some might be Generic. There are a lot
more types of URI that just those that start with http://. Have you ever
seen a jdbc resource string?

-Rob

I expect a user to input a HTTP or HTTPS URL. e.g., http://abcdef.gov
Maybe using URI seems too generic after the research as ‘uri’ means
different protocols, not just http/https.

I’ll look into it. Perhaps using Regexp match would be better.

Jon

Hunt_J · September 28, 2009, 7:02am

On Sep 28, 2009, at 12:45 AM, Hunt J. wrote:

retry
seen a jdbc resource string?

-Rob

I expect a user to input a HTTP or HTTPS URL. e.g., http://abcdef.gov
Maybe using URI seems too generic after the research as ‘uri’ means
different protocols, not just http/https.

I’ll look into it. Perhaps using Regexp match would be better.

Jon

You can see what the scheme is determined to be:

irb> require ‘uri’
=> true
irb> u=URI.parse(‘http://example.com/’)
=> #<URI::HTTP:0x395b34 URL:http://example.com/>
irb> u.scheme
=> “http”
irb> x=URI.parse(‘example.com’)
=> #<URI::Generic:0x392f24 URL:example.com>
irb> x.scheme
=> nil

You probably don’t want to jump down the Regexp rabbit-hole if you
know that you want a valid URI. Let the library do the heavy lifting.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

Hunt_J · September 28, 2009, 3:40pm

On Sunday 27 September 2009 11:45:40 pm Hunt J. wrote:

I expect a user to input a HTTP or HTTPS URL. e.g., http://abcdef.gov
Maybe using URI seems too generic after the research as ‘uri’ means
different protocols, not just http/https.

Well, a URI isn’t even required to work. Just a clarification:

A URL is meant to actually refer to a resource. For example,

http://ruby-lang.org/

actually refers to a working website, and is thus a URL – thus, the
protocol
must be something that actually exists, and as a practical matter,
you’ll want
it to be something you (or your browser) know how to handle.

A URI only needs to be globally unique. For example:

http://www.w3.org/1999/xhtml

It doesn’t matter AT ALL whether this points to a working resource. The
Web
will continue to work, even if w3.org completely implodes. As a matter
of
courtesy, the W3C has actually made this a valid URL, which points to a
description of what that namespace is, and the specifications that use
it –
but when your browser sees that URI at the top of a web page:

It doesn’t actually talk to w3.org at all. It just knows internally that
this
namespaces is where HTML elements go in an XHTML document.

On a completely unrelated note, if you know how XML namespaces work,
technically, the following would probably work, on browsers that
understand
XHTML:

<foobar:html xmlns:foobar=‘XHTML namespace’>
foobar:head
…
</foobar:head>
foobar:body
…
</foobar:body>
</foobar:html>

I suspect that the spec explicitly disallows this, at least in the
“transitional” mode, because it’s not backwards compatible with HTML
4.0. But
the point is, internally, the browser is looking for an html element
associated with that URI – which is why it’s not a valid xhtml document
if
you don’t include that xmlns in some form.