Forum: Ruby Finding filename from a URL

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Sam F. (Guest)
on 2009-01-04 18:30
Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam
Jan-Erik R. (Guest)
on 2009-01-04 18:32
(Received via mailing list)
Sam Fent schrieb:
>
> Sam
File.basename("http://www.example.com/x/y/z/myfile.txt")
works perfectly for urls ;)
Tim H. (Guest)
on 2009-01-04 18:38
(Received via mailing list)
Sam Fent wrote:
>
> Sam


$ irb
irb(main):001:0> x = "http://www.example.com/x/y/z/myfile.txt"
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> File.basename(x)
=> "myfile.txt"
irb(main):003:0> File.basename(x, '.txt')
=> "myfile"
Robert K. (Guest)
on 2009-01-04 20:05
(Received via mailing list)
On 04.01.2009 17:29, Sam Fent wrote:
> This is just a basic parsing question, really. I'm trying to work out
> how I would process a URL such as
> "http://www.example.com/x/y/z/myfile.txt" and get back the filename
> "myfile". Basically the pattern is to get the past part of the string
> after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because
File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is
what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Kind regards

  robert
Sam F. (Guest)
on 2009-01-04 21:00
Jan-Erik R. wrote:
> Sam Fent schrieb:
>>
>> Sam
> File.basename("http://www.example.com/x/y/z/myfile.txt")
> works perfectly for urls ;)

Thanks a lot! I added ".txt" to the arguments of File.basename to get
rid of the filetype, but besides that, that was what I was looking for.

Thanks!
Rob B. (Guest)
on 2009-01-04 22:48
(Received via mailing list)
On Jan 4, 2009, at 1:04 PM, Robert K. wrote:

> irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
> Kind regards
>
>   robert
>


Rather than jump to a Regexp, just use the right tool for the job.

irb> require 'uri'
=> true
irb> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
irb> u.path
=> "/x/y/z/myfile.txt"
irb> File.basename u.path, '.txt'
=> "myfile"

-Rob

Rob B.    http://agileconsultingllc.com
removed_email_address@domain.invalid
Robert K. (Guest)
on 2009-01-04 23:45
(Received via mailing list)
On 04.01.2009 21:46, Rob B. wrote:
>>
>
> Rather than jump to a Regexp, just use the right tool for the job.
>
> irb> require 'uri'
> => true
> irb> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'
> => #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
> irb> u.path
> => "/x/y/z/myfile.txt"
> irb> File.basename u.path, '.txt'
> => "myfile"

I considered URI as well but what makes your code the "right tool for
the job"?  Basically you use URI only to extract the path and then use
File.basename to get the last bit of the path.  But: while the URI path
consists of elements separated by "/", File.basename also considers "\\"
as delimiter.  So IMHO it is by no means "the right tool" - at least not
more than using a regular expression which extracts exactly the part
needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns
the last path element but as far as I can see this does not exist.

Kind regards

  robert
Rob B. (Guest)
on 2009-01-04 23:59
(Received via mailing list)
On Jan 4, 2009, at 4:44 PM, Robert K. wrote:

>>> IMHO it is not a good idea to use a File method for URL's because
>>> irb(main):002:0> name = url[%r{[^/]+\z}]
>
> returns the last path element but as far as I can see this does not
> exist.
>
> Kind regards
>
>   robert


I guess it depends on what your url might look like. For example, if
it contains a query string:

irb> str = 'http://a.b.c/root/sub/dir/file?param=a'
=> "http://a.b.c/root/sub/dir/file?param=a"
irb> File.basename str
=> "file?param=a"

Oops!  File.basename just doesn't fit.

irb> require 'uri'
=> true
irb> url = URI.parse(str)
=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a>
irb> url.path
=> "/root/sub/dir/file"
irb> File.basename url.path
=> "file"

The OP will have to make the final tool selection, but there may be
lurkers that have similar problems who find URI a better fit than File.

-Rob

Rob B.    http://agileconsultingllc.com
removed_email_address@domain.invalid
Robert K. (Guest)
on 2009-01-05 19:52
(Received via mailing list)
On 04.01.2009 22:58, Rob B. wrote:
>>>>> string
>>>> irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
>>> irb> File.basename u.path, '.txt'
>> The situation would be different if URI provided a method which
>
> => #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a>
> irb> url.path
> => "/root/sub/dir/file"
> irb> File.basename url.path
> => "file"
>
> The OP will have to make the final tool selection, but there may be
> lurkers that have similar problems who find URI a better fit than File.

Certainly.  I do have to say that I get the impression we are talking a
bit past each other.  I wasn't advocating to use File.basename at all -
not alone and not in combination with URI!

For the URL with query part I would still rather do

name = URI.parse(str).path[%r{[^/]+\z}]

Kind regards

  robert
This topic is locked and can not be replied to.