My level of ruby-experience has just slightly passed hello-world, I’m
quite new to this programming language; I hope I don’t bother all of you
too much by posting my low-level question here.
I would like to make a program that iterates recursively trough my music
directory, and selects the top-10 of most recently played tracks.
I try to use File.atime(path) to check if the current file is accessed
recently enough to get into my top-10, but my problem is that I don’t
know how to (correctly) compare a date to the return value of
File.atime.
I’m sure you can give me some simple hints, or a small push in the right
direction. Thanks in advance.
this is what I have so far:
require ‘find’
require ‘date’
dir = “/share/music”
Find.find(dir) do |path|
if File.directory?(path)
next
else
if path.match(".*mp3") != nil
print File.atime(path)," : "
print path,"\n"
end
end
My level of ruby-experience has just slightly passed hello-world, I’m
quite new to this programming language; I hope I don’t bother all of you
too much by posting my low-level question here.
I would like to make a program that iterates recursively trough my music
directory, and selects the top-10 of most recently played tracks.
I try to use File.atime(path) to check if the current file is accessed
recently enough to get into my top-10, but my problem is that I don’t
know how to (correctly) compare a date to the return value of
File.atime.
File.atime returns a Time object, you can compare it to another Time
object, for example:
if File.atime(path) > (Time.now - 60*60)
file has been accessed in the last hour
end
But you can also solve the whole problem much easier:
Hmm… it just occurred to me that many of the solutions presented here
have the flaw of potentially calling File.atime() multiple times for
the same file which would require unnecessary calls to the operating
system to get the access time of the file.
Better to compute atime only once and then sort on the result. In the
case below, I then extract the file name back out of the resulting
array, but in actuality, there may be some value in providing that to
the caller in case they could make use of the access time values.
require ‘pp’
require ‘find’
def n_recent_files n = 1, path = ‘.’, exts = nil
paths = []
Find.find(path) do |p|
if File.file?§ &&
( exts.nil? ||
( (ext = File.extname§) &&
!ext.empty? &&
exts.include?(ext) ) )
paths << [ p, File.atime§.to_i ]
end
end
paths.sort! {|a,b| b[1] <=> a[1] }[0,n].map {|x| x[0]}
end
First of all: thank you all for your amazing amount of replies (in one
night!) and you warm welcome! You’ve definitely helped me solve my
problem, and made me view the problem from a different angle.
Top to bottom:
@Andreas S:
Your solution would indeed be quite a lot simpler than mine! Allthough
it might not be the most efficient, I will also implement your code,
just because it looks so darn easy! Thanks again!
@Spring Flowers:
Thanks for your concern, I don’t know anything about Vista-behaviour. My
OS is Ubuntu, and when rhythmbox runs/opens a file, it’s atime is set
=).
@Justin:
Thanks a lot for your string interpolation and puts idea! That really
makes life alot easier!
@Brian:
Whow, you really made my day. That was exactly what I was looking for!
Thanks a lot for the effort, code, and useful explanation! I don’t
really know what to say. Thanks!
Ah, you’re right! I missed that also (even though the ** line is
highlighted in the pickaxe now that I bother to read Dir#glob ).
Before reading your explanation, I put the following together. It
might be a little friendlier for handling multiple extensions I
suppose, especially since the glob pattern isn’t a real regex.
require ‘pp’
require ‘find’
def n_recent_files n = 1, path = ‘.’, exts = nil
paths = []
Find.find(path) do |p|
if File.file?(p) &&
( exts.nil? ||
( (ext = File.extname(p)) &&
!ext.empty? &&
exts.include?(ext) ) )
paths << p
end
end
paths.sort! {|a,b| File.atime(b) <=> File.atime(a) }[0,n]
end
Enumerable#sort_by can be slow in some cases, and by using sort (or
sort! in this case), the comparison can be reversed to avoid calling
Array#reverse on the entire array later.
It exhibits a benefit of ‘duck typing’ since both String and Array
define include? the caller can pass in a single string or an array of
strings for the extension parameter w/ no extra effort in the
function.
Although the if expression is complicated, it has the advantage of
only computing the file extension when necessary.
Ah, you’re right! I missed that also (even though the ** line is
highlighted in the pickaxe now that I bother to read Dir#glob ).
Before reading your explanation, I put the following together. It
might be a little friendlier for handling multiple extensions I
glob can do that easier, too:
Dir[‘/share/music/**/*.{mp3,m4p}’].sort_by{|f| File.atime
f}.reverse[0,10]
I’d be careful with all these optimizations you are suggesting. By far
the slowest part is the recursive traversal of the directory, and you
can’t speed that up. Array#reverse is in a completely different league
and not worth optimizing if you have to sacrifice readability. The
File.atime calls are pretty fast, too (100.000 per second on my old
powerbook).
Hmm… it just occurred to me that many of the solutions presented here
have the flaw of potentially calling File.atime() multiple times for
the same file which would require unnecessary calls to the operating
system to get the access time of the file.
Really? Which ones? You do realize that #sort_by is explicitly
designed to call the comparison method exactly once for each object,
right?
From the ri docs themselves:
“A more efficient technique is to cache the sort keys (modification
times in this case) before the sort. Perl users often call this
approach a Schwartzian Transform, after Randal Schwartz. We construct
a temporary array, where each element is an array containing our sort
key along with the filename. We sort this array, and then extract the
filename from the result.”
My understanding is that with a directory containing 5,000 MP3s, this
solution:
The above has a flaw.
Dir[’/share/music/**/*.mp3’] already includes the .mp3 in
'/share/music/ itself, so when you add '/share/music/.mp3’ in Dir.glob
you add the *.mp3 from that directory again with the result that you
have them two times in your array:
On Oct 13, 8:35 pm, “Andreas S.” [email protected] wrote:
I’d be careful with all these optimizations you are suggesting. By far
the slowest part is the recursive traversal of the directory, and you
can’t speed that up. Array#reverse is in a completely different league
and not worth optimizing if you have to sacrifice readability. The
File.atime calls are pretty fast, too (100.000 per second on my old
powerbook).
You’re correct, my bad. Thanks for bringing that to my attention.
The speed difference between Dir (good) and Find (terrible) totally
dominates. Never underestimate the slowness of Ruby code compared to C
code running in the interpreter
Phrogz is correct regarding Enumerable#sort_by. Since it builds an
array of tuples first, File.atime is only called once per path. I was
influenced by misapplying the warning in the pickaxe, but in this
case, sort_by seems warranted since I was basically doing the same
thing (building an array of tuples with the sort value) manually - but
in Ruby instead of C! In fact, if I had bothered to turn the page, the
example they give is strangely relevant!
Array#reverse is just noise in the profile below, so I should be
more careful about avoiding it.
Sorry Robin, your praise was premature
On minor point; I think you may be mistaken regarding the slowest part
being the directory traversal (at least in your code). Both the
sorting and time comparison are much greater:
Hmm… it just occurred to me that many of the solutions presented here
have the flaw of potentially calling File.atime() multiple times for
the same file which would require unnecessary calls to the operating
system to get the access time of the file.
Really? Which ones?
Uh, those would be mine Moot point though (see other post).
You do realize that #sort_by is explicitly
designed to call the comparison method exactly once for each object,
right?
Actually, I had missed that. Thanks for pointing it out. This seems to
be a case where sort_by is certainly warranted.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.