Finding filename from a URL

samf · January 4, 2009, 5:30pm

Hi all,

This is just a basic parsing question, really. I’m trying to work out
how I would process a URL such as
“http://www.example.com/x/y/z/myfile.txt” and get back the filename
“myfile”. Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

samf · January 4, 2009, 5:32pm

Sam Fent schrieb:

Sam
File.basename(“http://www.example.com/x/y/z/myfile.txt”)
works perfectly for urls

samf · January 4, 2009, 5:38pm

Sam Fent wrote:

Sam

$ irb
irb(main):001:0> x = “http://www.example.com/x/y/z/myfile.txt”
=> “http://www.example.com/x/y/z/myfile.txt”
irb(main):002:0> File.basename(x)
=> “myfile.txt”
irb(main):003:0> File.basename(x, ‘.txt’)
=> “myfile”

samf · January 4, 2009, 8:00pm

Jan-Erik R. wrote:

Sam Fent schrieb:

Sam
File.basename(“http://www.example.com/x/y/z/myfile.txt”)
works perfectly for urls

Thanks a lot! I added “.txt” to the arguments of File.basename to get
rid of the filetype, but besides that, that was what I was looking for.

Thanks!

samf · January 4, 2009, 9:48pm

On Jan 4, 2009, at 1:04 PM, Robert K. wrote:

irb(main):003:0> File.basename ‘http://test.com/aaa\\bbb.txt’
Kind regards

robert

Rather than jump to a Regexp, just use the right tool for the job.

irb> require ‘uri’
=> true
irb> u=URI.parse ‘http://www.example.com/x/y/z/myfile.txt’
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
irb> u.path
=> “/x/y/z/myfile.txt”
irb> File.basename u.path, ‘.txt’
=> “myfile”

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

samf · January 4, 2009, 10:45pm

On 04.01.2009 21:46, Rob B. wrote:

Rather than jump to a Regexp, just use the right tool for the job.

irb> require ‘uri’
=> true
irb> u=URI.parse ‘http://www.example.com/x/y/z/myfile.txt’
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
irb> u.path
=> “/x/y/z/myfile.txt”
irb> File.basename u.path, ‘.txt’
=> “myfile”

I considered URI as well but what makes your code the “right tool for
the job”? Basically you use URI only to extract the path and then use
File.basename to get the last bit of the path. But: while the URI path
consists of elements separated by “/”, File.basename also considers “\”
as delimiter. So IMHO it is by no means “the right tool” - at least not
more than using a regular expression which extracts exactly the part
needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns
the last path element but as far as I can see this does not exist.

Kind regards

robert

samf · January 4, 2009, 7:05pm

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I’m trying to work out
how I would process a URL such as
“http://www.example.com/x/y/z/myfile.txt” and get back the filename
“myfile”. Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL’s because
File.basename has different criteria

irb(main):003:0> File.basename ‘http://test.com/aaa\\bbb.txt’
=> “bbb.txt”

Although I am not sure whether a backslash is allowed there, this is
what I’d do:

irb(main):001:0> url = ‘http://www.example.com/x/y/z/myfile.txt’
=> “http://www.example.com/x/y/z/myfile.txt”
irb(main):002:0> name = url[%r{[^/]+\z}]
=> “myfile.txt”

Kind regards

robert

samf · January 4, 2009, 10:59pm

On Jan 4, 2009, at 4:44 PM, Robert K. wrote:

IMHO it is not a good idea to use a File method for URL’s because
irb(main):002:0> name = url[%r{[^/]+\z}]

returns the last path element but as far as I can see this does not
exist.

Kind regards

robert

I guess it depends on what your url might look like. For example, if
it contains a query string:

irb> str = ‘http://a.b.c/root/sub/dir/file?param=a’
=> “http://a.b.c/root/sub/dir/file?param=a”
irb> File.basename str
=> “file?param=a”

Oops! File.basename just doesn’t fit.

irb> require ‘uri’
=> true
irb> url = URI.parse(str)
=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a>
irb> url.path
=> “/root/sub/dir/file”
irb> File.basename url.path
=> “file”

The OP will have to make the final tool selection, but there may be
lurkers that have similar problems who find URI a better fit than File.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

samf · January 5, 2009, 6:52pm

On 04.01.2009 22:58, Rob B. wrote:

string
irb(main):001:0> url = ‘http://www.example.com/x/y/z/myfile.txt’
irb> File.basename u.path, ‘.txt’
The situation would be different if URI provided a method which

=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a>
irb> url.path
=> “/root/sub/dir/file”
irb> File.basename url.path
=> “file”

The OP will have to make the final tool selection, but there may be
lurkers that have similar problems who find URI a better fit than File.

Certainly. I do have to say that I get the impression we are talking a
bit past each other. I wasn’t advocating to use File.basename at all -
not alone and not in combination with URI!

For the URL with query part I would still rather do

name = URI.parse(str).path[%r{[^/]+\z}]

Kind regards

robert