Fastest way of looping through directories

Hi,

I’m looking for the fastest way for looping through directories.

at the moment i’m using Find but when I test this, the first time it
runs
really slow (like 20 seconds) the second time it runs much much quicker
(less than a second).
The problem is that I don’t need to run this twice but only once. Is
this
because I use the unit tests that it runs slow or is it something else.

I only need the path of all the files in the directory + subdirectories.
What would be the fastest for this?
using Dir.foreach or Find or something else

Thanks for the help

To do a deep listing i do a recursive algorithm with Dir.foreach. Here
is some code that counts up the sizes you could change the return values
so that it returns a list of files and directories. this also stops if
the depth is >24 this helps to stop if it catches a hardlink that makes
it loop on it’s self.

def deepsize(path,depth)
if(depth>24)
return 0
end
if(!File.exists?(path))
return 0
end
d=Dir.new(path)
cont=true
size=0
while(cont)
sub=d.read
if(sub==nil)
break
end

    if(sub=="."||sub=="..")

      next
    end
    st=File.lstat(path+"/"+sub)
    if(st.file?)
      size+=st.size

    else

      if(!st.symlink?&&st.directory?)
        size+=deepsize(path+"/"+sub,depth+1)

      end
    end
  end
  return size
end

On Mon, 10 Jul 2006, Thomas Coopman wrote:

Hi,

I’m looking for the fastest way for looping through directories.

at the moment i’m using Find but when I test this, the first time it runs
really slow (like 20 seconds) the second time it runs much much quicker
(less than a second). The problem is that I don’t need to run this twice
but only once. Is this because I use the unit tests that it runs slow or is
it something else.

it’s because the underlying filesystem does agressive caching - i’m
assuming
you’re on linux? regardless, it has nothing to do with ruby at all.

I only need the path of all the files in the directory + subdirectories.
What would be the fastest for this? using Dir.foreach or Find or something
else

this will be the fastest way

Dir.glob(’/’) do |entry|
p entry
end

it use it on directories with 200k files all the time. basically you
want to
avoid

list = Dir.glob ‘/’ # creates huge array first

list = Find.find d # creates huge array first

etc. eg. avoid methods which scour the whole directory first and use
methods
which provide at iterator to handle each file as it’s encountered.
probably
you’ll find this much faster or, at least, more responsive.

regards.

-a

Yes, I’m on linux!

I tried find and glob and there are no real differences.
I think I’ll stick with Find.

Thanks for all the help

Thomas

“Thomas Coopman” [email protected] writes:

I’m looking for the fastest way for looping through directories.

at the moment i’m using Find but when I test this, the first time it runs
really slow (like 20 seconds) the second time it runs much much quicker

Find is about the fastest way. The bottleneck is in the I/O.
There is not much you can do about that beside getting a better
disk subsytem.

The reason it was faster the second time is because your OS had cached
the information.

If the list of entries are fixed or do not change much, you can try
saving them in a file, so subsequent runs do not have to traverse
through the directories again.

YS.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs