Hi,
I’m looking for the fastest way for looping through directories.
at the moment i’m using Find but when I test this, the first time it
runs
really slow (like 20 seconds) the second time it runs much much quicker
(less than a second).
The problem is that I don’t need to run this twice but only once. Is
this
because I use the unit tests that it runs slow or is it something else.
I only need the path of all the files in the directory + subdirectories.
What would be the fastest for this?
using Dir.foreach or Find or something else
Thanks for the help
To do a deep listing i do a recursive algorithm with Dir.foreach. Here
is some code that counts up the sizes you could change the return values
so that it returns a list of files and directories. this also stops if
the depth is >24 this helps to stop if it catches a hardlink that makes
it loop on it’s self.
def deepsize(path,depth)
if(depth>24)
return 0
end
if(!File.exists?(path))
return 0
end
d=Dir.new(path)
cont=true
size=0
while(cont)
sub=d.read
if(sub==nil)
break
end
if(sub=="."||sub=="..")
next
end
st=File.lstat(path+"/"+sub)
if(st.file?)
size+=st.size
else
if(!st.symlink?&&st.directory?)
size+=deepsize(path+"/"+sub,depth+1)
end
end
end
return size
end
On Mon, 10 Jul 2006, Thomas Coopman wrote:
Hi,
I’m looking for the fastest way for looping through directories.
at the moment i’m using Find but when I test this, the first time it runs
really slow (like 20 seconds) the second time it runs much much quicker
(less than a second). The problem is that I don’t need to run this twice
but only once. Is this because I use the unit tests that it runs slow or is
it something else.
it’s because the underlying filesystem does agressive caching - i’m
assuming
you’re on linux? regardless, it has nothing to do with ruby at all.
I only need the path of all the files in the directory + subdirectories.
What would be the fastest for this? using Dir.foreach or Find or something
else
this will be the fastest way
Dir.glob(’/’) do |entry|
p entry
end
it use it on directories with 200k files all the time. basically you
want to
avoid
list = Dir.glob ‘/’ # creates huge array first
list = Find.find d # creates huge array first
etc. eg. avoid methods which scour the whole directory first and use
methods
which provide at iterator to handle each file as it’s encountered.
probably
you’ll find this much faster or, at least, more responsive.
regards.
-a
Yes, I’m on linux!
I tried find and glob and there are no real differences.
I think I’ll stick with Find.
Thanks for all the help
Thomas
“Thomas Coopman” [email protected] writes:
I’m looking for the fastest way for looping through directories.
at the moment i’m using Find but when I test this, the first time it runs
really slow (like 20 seconds) the second time it runs much much quicker
Find is about the fastest way. The bottleneck is in the I/O.
There is not much you can do about that beside getting a better
disk subsytem.
The reason it was faster the second time is because your OS had cached
the information.
If the list of entries are fixed or do not change much, you can try
saving them in a file, so subsequent runs do not have to traverse
through the directories again.
YS.