Top 10 last played mp3's

robintw · October 13, 2007, 11:32pm

Hello everyone,

My level of ruby-experience has just slightly passed hello-world, I’m
quite new to this programming language; I hope I don’t bother all of you
too much by posting my low-level question here.

I would like to make a program that iterates recursively trough my music
directory, and selects the top-10 of most recently played tracks.

I try to use File.atime(path) to check if the current file is accessed
recently enough to get into my top-10, but my problem is that I don’t
know how to (correctly) compare a date to the return value of
File.atime.

I’m sure you can give me some simple hints, or a small push in the right
direction. Thanks in advance.

this is what I have so far:

require ‘find’
require ‘date’

dir = “/share/music”

Find.find(dir) do |path|

if File.directory?(path)
next
else
if path.match(".*mp3") != nil
print File.atime(path)," : "
print path,"\n"
end
end

end

robintw · October 13, 2007, 11:48pm

Robin Wagenaar wrote:

My level of ruby-experience has just slightly passed hello-world, I’m
quite new to this programming language; I hope I don’t bother all of you
too much by posting my low-level question here.

I would like to make a program that iterates recursively trough my music
directory, and selects the top-10 of most recently played tracks.

I try to use File.atime(path) to check if the current file is accessed
recently enough to get into my top-10, but my problem is that I don’t
know how to (correctly) compare a date to the return value of
File.atime.

File.atime returns a Time object, you can compare it to another Time
object, for example:

if File.atime(path) > (Time.now - 60*60)

file has been accessed in the last hour

end

But you can also solve the whole problem much easier:

Dir[’/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{|f|
File.atime}.reverse[0,10]

robintw · October 14, 2007, 1:07am

Quoth Andreas S.:

know how to (correctly) compare a date to the return value of

Dir[’/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{|f|
File.atime}.reverse[0,10]

Or, if it’s deeper than two levels:

require ‘find’

Find.find("/share/music/").sort_by { |f| f.atime }.reverse[0…10]

I’m glad you (the OP) are interested in learning more about ruby.

Regards,

robintw · October 14, 2007, 1:44am

Robin Wagenaar wrote:

recently enough to get into my top-10, but my problem is that I don’t
require ‘find’
print File.atime(path)," : "
print path,"\n"
end
end

end

Your original question has been answered already, but just a small note:
puts and string interpolation is generally a lot simpler to use for
output:

puts “#{File.atime(path)} : #{path}”

That will also add the “\n” or “\r\n” as appropriate, so you do not need
to.

-Justin

robintw · October 14, 2007, 1:26am

Andreas S. wrote:

Dir[’/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{|f|
File.atime}.reverse[0,10]

this one seems to work

a = Dir[‘E:/Music/*.mp3’].sort_by{|f| File.atime(f)}.reverse[0,10]
a.each {|f| p f, File.atime(f)}

except on Windows Vista, I noticed 2 things:

The filenames with internation characters come out as ???.mp3
and therefore File.atime(f) will fails afterwards.
The access time of .mp3 or .txt or .txt is unchanged even after the
file is played, or run (such as by ruby test_dir.rb), or looked at by
Notepad.

robintw · October 14, 2007, 2:35am

Konrad M. wrote:

Quoth Andreas S.:

know how to (correctly) compare a date to the return value of

Dir[’/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{|f|
File.atime}.reverse[0,10]

Or, if it’s deeper than two levels:

‘**’ takes care of that.

robintw · October 14, 2007, 2:35am

On Oct 13, 7:06 pm, Konrad M. [email protected] wrote:

require ‘find’

Find.find(“/share/music/”).sort_by { |f| f.atime }.reverse[0…10]

Doesn’t Find.find() require a block? And doesn’t it pass a string to
the block, not a File ?

robintw · October 14, 2007, 3:55am

Hmm… it just occurred to me that many of the solutions presented here
have the flaw of potentially calling File.atime() multiple times for
the same file which would require unnecessary calls to the operating
system to get the access time of the file.

Better to compute atime only once and then sort on the result. In the
case below, I then extract the file name back out of the resulting
array, but in actuality, there may be some value in providing that to
the caller in case they could make use of the access time values.

require ‘pp’
require ‘find’

def n_recent_files n = 1, path = ‘.’, exts = nil
paths = []
Find.find(path) do |p|
if File.file?§ &&
( exts.nil? ||
( (ext = File.extname§) &&
!ext.empty? &&
exts.include?(ext) ) )
paths << [ p, File.atime§.to_i ]
end
end
paths.sort! {|a,b| b[1] <=> a[1] }[0,n].map {|x| x[0]}
end

pp n_recent_files(10, ‘/home/brian/temp’, ‘.rb’)
pp n_recent_files(10, ‘/home/brian/temp’, [’.rb’, ‘.txt’])

robintw · October 14, 2007, 5:47am

Quoth Brian A.:

On Oct 13, 7:06 pm, Konrad M. [email protected] wrote:

require ‘find’

Find.find(“/share/music/”).sort_by { |f| f.atime }.reverse[0…10]

Doesn’t Find.find() require a block? And doesn’t it pass a string to
the block, not a File ?

Sorry, don’t know anything about it, I was just going by the usage put
forth
by the guy in front of me.

robintw · October 14, 2007, 11:58am

First of all: thank you all for your amazing amount of replies (in one
night!) and you warm welcome! You’ve definitely helped me solve my
problem, and made me view the problem from a different angle.

Top to bottom:

@Andreas S:
Your solution would indeed be quite a lot simpler than mine! Allthough
it might not be the most efficient, I will also implement your code,
just because it looks so darn easy! Thanks again!

@Spring Flowers:
Thanks for your concern, I don’t know anything about Vista-behaviour. My
OS is Ubuntu, and when rhythmbox runs/opens a file, it’s atime is set
=).

@Justin:
Thanks a lot for your string interpolation and puts idea! That really
makes life alot easier!

@Brian:
Whow, you really made my day. That was exactly what I was looking for!
Thanks a lot for the effort, code, and useful explanation! I don’t
really know what to say. Thanks!

robintw · October 14, 2007, 3:35am

On Oct 13, 8:35 pm, “Andreas S.” [email protected]
wrote:

Konrad M. wrote:

Quoth Andreas S.:

know how to (correctly) compare a date to the return value of

Dir[‘/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{|f|
File.atime}.reverse[0,10]

Or, if it’s deeper than two levels:

‘**’ takes care of that.

Ah, you’re right! I missed that also (even though the ** line is
highlighted in the pickaxe now that I bother to read Dir#glob ).

Before reading your explanation, I put the following together. It
might be a little friendlier for handling multiple extensions I
suppose, especially since the glob pattern isn’t a real regex.

require ‘pp’
require ‘find’

def n_recent_files n = 1, path = ‘.’, exts = nil
paths = []
Find.find(path) do |p|
if File.file?(p) &&
( exts.nil? ||
( (ext = File.extname(p)) &&
!ext.empty? &&
exts.include?(ext) ) )
paths << p
end
end
paths.sort! {|a,b| File.atime(b) <=> File.atime(a) }[0,n]
end

pp n_recent_files(10, ‘/home/brian/temp’, ‘.mp3’)
pp n_recent_files(10, ‘/home/brian/temp’, [‘.mp3’, ‘.ogg’])

Some notes for the OP:

Welcome to Ruby!
Enumerable#sort_by can be slow in some cases, and by using sort (or
sort! in this case), the comparison can be reversed to avoid calling
Array#reverse on the entire array later.
It exhibits a benefit of ‘duck typing’ since both String and Array
define include? the caller can pass in a single string or an array of
strings for the extension parameter w/ no extra effort in the
function.
Although the if expression is complicated, it has the advantage of
only computing the file extension when necessary.

Brian

robintw · October 14, 2007, 1:03pm

Brian A. wrote:

On Oct 13, 8:35 pm, “Andreas S.” [email protected]
wrote:

Konrad M. wrote:

Quoth Andreas S.:

know how to (correctly) compare a date to the return value of

Dir[‘/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{|f|
File.atime}.reverse[0,10]

Or, if it’s deeper than two levels:

‘**’ takes care of that.

Ah, you’re right! I missed that also (even though the ** line is
highlighted in the pickaxe now that I bother to read Dir#glob ).

Before reading your explanation, I put the following together. It
might be a little friendlier for handling multiple extensions I

glob can do that easier, too:
Dir[‘/share/music/**/*.{mp3,m4p}’].sort_by{|f| File.atime
f}.reverse[0,10]

I’d be careful with all these optimizations you are suggesting. By far
the slowest part is the recursive traversal of the directory, and you
can’t speed that up. Array#reverse is in a completely different league
and not worth optimizing if you have to sacrifice readability. The
File.atime calls are pretty fast, too (100.000 per second on my old
powerbook).

robintw · October 14, 2007, 4:55pm

On Oct 13, 7:50 pm, Brian A. [email protected] wrote:

Hmm… it just occurred to me that many of the solutions presented here
have the flaw of potentially calling File.atime() multiple times for
the same file which would require unnecessary calls to the operating
system to get the access time of the file.

Really? Which ones? You do realize that #sort_by is explicitly
designed to call the comparison method exactly once for each object,
right?

From the ri docs themselves:

“A more efficient technique is to cache the sort keys (modification
times in this case) before the sort. Perl users often call this
approach a Schwartzian Transform, after Randal Schwartz. We construct
a temporary array, where each element is an array containing our sort
key along with the filename. We sort this array, and then extract the
filename from the result.”

My understanding is that with a directory containing 5,000 MP3s, this
solution:

Dir[‘/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{ |f|
File.atime(f)
}.reverse[0,10]

will call File.atime exactly 5,000 times and create exactly 5,000 Time
instances.

robintw · October 14, 2007, 6:54pm

Am 14 Oct 2007 um 23:55 hat Phrogz geschrieben:

Dir[’/share/music/**/.mp3’, '/share/music/.mp3’]

The above has a flaw.
Dir[’/share/music/**/*.mp3’] already includes the .mp3 in
'/share/music/ itself, so when you add '/share/music/.mp3’ in Dir.glob
you add the *.mp3 from that directory again with the result that you
have them two times in your array:

aaa/
aaa/x.mp3
aaa/bbb/
aaa/bbb/y.mp3
aaa/bbb/ccc
aaa/bbb/ccc/z.mp3

p Dir[‘aaa/**/*.mp3’].sort
#=> [“aaa/x.mp3”, “aaa/bbb/y.mp3”, “aaa/bbb/ccc/z.mp3”]

p Dir[‘aaa/**/.mp3’,'aaa/.mp3’].sort
#=> [“aaa/x.mp3”, “aaa/x.mp3”,
“aaa/bbb/y.imp3”, “aaa/bbb/ccc/z.mp3”]

Dirk T.

robintw · October 14, 2007, 5:13pm

Gavin K. wrote:

My understanding is that with a directory containing 5,000 MP3s, this
solution:

Dir[’/share/music/**/.mp3’, '/share/music/.mp3’].sort_by{ |f|
File.atime(f)
}.reverse[0,10]

will call File.atime exactly 5,000 times and create exactly 5,000 Time
instances.

won’t it call File.atime(f) (c * n log n) times?
n log n is the big O… O(n log n)… and then c is the constant
depending on the sort algorithm.

robintw · October 14, 2007, 8:34pm

On 10/14/07, SpringFlowers AutumnMoon [email protected] wrote:

instances.

won’t it call File.atime(f) (c * n log n) times?
n log n is the big O… O(n log n)… and then c is the constant
depending on the sort algorithm.

No, sort_by builds a parallel array with the value of each element in
the original collection and uses that array for the sort values.

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

robintw · October 14, 2007, 10:35pm

On Oct 14, 7:03 am, “Andreas S.” [email protected]
wrote:

Brian A. wrote:

On Oct 13, 8:35 pm, “Andreas S.” [email protected] wrote:
I’d be careful with all these optimizations you are suggesting. By far
the slowest part is the recursive traversal of the directory, and you
can’t speed that up. Array#reverse is in a completely different league
and not worth optimizing if you have to sacrifice readability. The
File.atime calls are pretty fast, too (100.000 per second on my old
powerbook).

You’re correct, my bad. Thanks for bringing that to my attention.

The speed difference between Dir (good) and Find (terrible) totally
dominates. Never underestimate the slowness of Ruby code compared to C
code running in the interpreter
Phrogz is correct regarding Enumerable#sort_by. Since it builds an
array of tuples first, File.atime is only called once per path. I was
influenced by misapplying the warning in the pickaxe, but in this
case, sort_by seems warranted since I was basically doing the same
thing (building an array of tuples with the sort value) manually - but
in Ruby instead of C! In fact, if I had bothered to turn the page, the
example they give is strangely relevant!
Array#reverse is just noise in the profile below, so I should be
more careful about avoiding it.
Sorry Robin, your praise was premature

On minor point; I think you may be mistaken regarding the slowest part
being the directory traversal (at least in your code). Both the
sorting and time comparison are much greater:

brian@imagine:~/sync/code/ruby$ ruby -r profile andreas.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.42 1.32 1.32 2 660.00 1385.00
Enumerable.sort_by
28.95 2.20 0.88 26958 0.03 0.03 Time#<=>
12.50 2.58 0.38 4 95.00 142.50 Array#each
7.89 2.82 0.24 2 120.00 120.00 Dir#[]
6.25 3.01 0.19 5220 0.04 0.04 File#atime
0.99 3.04 0.03 2 15.00 25.00 Kernel.require
0.00 3.04 0.00 10 0.00 0.00
Module#class_eval
0.00 3.04 0.00 9 0.00 0.00
Kernel.singleton_method_added
0.00 3.04 0.00 3 0.00 0.00 Module#included
0.00 3.04 0.00 1 0.00 0.00
Module#module_function
0.00 3.04 0.00 8 0.00 0.00 Class#inherited
0.00 3.04 0.00 77 0.00 0.00
Module#method_added
0.00 3.04 0.00 3 0.00 0.00 Module#include
0.00 3.04 0.00 1 0.00 3010.00 Integer#times
0.00 3.04 0.00 1 0.00 0.00
Module#attr_accessor
0.00 3.04 0.00 3 0.00 0.00
Module#append_features
0.00 3.04 0.00 1 0.00 0.00 Module#private
0.00 3.04 0.00 5 0.00 0.00
Module#attr_reader
0.00 3.04 0.00 2 0.00 0.00 Array#[]
0.00 3.04 0.00 2 0.00 0.00 String#==
0.00 3.04 0.00 2 0.00 0.00 Array#reverse
-0.00 3.04 -0.00 2 -0.00 1505.00
Object#n_recent_files
0.00 3.04 0.00 1 0.00 3040.00 #toplevel

You were correct in my case though since I used the Find library:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
34.05 45.03 45.03 57436 0.78 2.08 Kernel.catch
23.39 75.96 30.93 16019 1.93 2.79 Dir#each
5.66 83.44 7.48 16019 0.47 0.47 Dir#open
5.34 90.50 7.06 220364 0.03 0.03 String#==
5.27 97.47 6.97 1 6970.00 128230.00 Find.find
3.71 102.38 4.91 57437 0.09 0.13 Kernel.dup
1.87 104.85 2.47 1 2470.00 3820.00 Array#sort!
1.79 107.22 2.37 57435 0.04 0.04 File#join
1.76 109.55 2.33 57435 0.04 0.04 Array#unshift
1.72 111.83 2.28 57437 0.04 0.04
String#initialize_copy
1.63 113.99 2.16 57436 0.04 0.04 File#exist?
1.60 116.10 2.11 57436 0.04 0.04 File#file?
1.54 118.14 2.04 57435 0.04 0.04 Kernel.untaint
1.42 120.02 1.88 57431 0.03 0.03 File#lstat
1.28 121.71 1.69 57431 0.03 0.03
File::Stat#directory?

robintw · October 14, 2007, 10:40pm

On Oct 14, 10:52 am, Phrogz [email protected] wrote:

On Oct 13, 7:50 pm, Brian A. [email protected] wrote:

Hmm… it just occurred to me that many of the solutions presented here
have the flaw of potentially calling File.atime() multiple times for
the same file which would require unnecessary calls to the operating
system to get the access time of the file.

Really? Which ones?

Uh, those would be mine Moot point though (see other post).

You do realize that #sort_by is explicitly
designed to call the comparison method exactly once for each object,
right?

Actually, I had missed that. Thanks for pointing it out. This seems to
be a case where sort_by is certainly warranted.