Better performance than native unix commands?

I have an issue with nfs and listing a large amount of files that I am
wondering if I can solve with ruby. I know that ruby has better
performance in some instances than a bash shell script so I am wondering
if anyone has experience or some examples of how this might work…

I have a directory with a large amount of files that is mounted over
nfs. The network is gigabit and so are the nic’s. The servers are very
high end so the hardware / network performance is not the bottleneck.
However, sometimes it can take a very long time to list out the files
contained within a directory.

I will do an ls and it might take 5 minutes to retrieve a response.

Would this be faster if I were to do the same ls function using the ruby
api?

Ma Sa wrote:

I have an issue with nfs and listing a large amount of files that I am
wondering if I can solve with ruby. I know that ruby has better
performance in some instances than a bash shell script so I am wondering
if anyone has experience or some examples of how this might work…

I have a directory with a large amount of files that is mounted over
nfs. The network is gigabit and so are the nic’s. The servers are very
high end so the hardware / network performance is not the bottleneck.
However, sometimes it can take a very long time to list out the files
contained within a directory.

I will do an ls and it might take 5 minutes to retrieve a response.

Would this be faster if I were to do the same ls function using the ruby
api?

The only way to know for sure would be make some simple tests. I just
ran these using the Dir class:

Macintosh:~ marike$ time ruby -e ‘d = Dir.new("."); d.each {|x| puts x}’
| wc
162 166 1761

real 0m0.027s
user 0m0.019s
sys 0m0.010s
Macintosh:~ marike$ time ls -A | wc
160 164 1756

real 0m0.008s
user 0m0.005s
sys 0m0.006s

ls does appear to be faster, but I’m sure others on this list know a
better Ruby way than that iterating over a Dir object.

Markus

Hi,

On Tue, Mar 4, 2008 at 12:10 PM, Ma Sa [email protected] wrote:

I have an issue with nfs and listing a large amount of files that I am
wondering if I can solve with ruby. I know that ruby has better
performance in some instances than a bash shell script so I am wondering
if anyone has experience or some examples of how this might work…

You may want to check your options. In some cases, I found using TCP
instead
of UDP to increase speed quite a bit - it really depends on your
network. It
also fixed a problem involving dropped packets over wireless leading to
connections never finishing their job. Also check your sync/async
options.

Arlen

Ma Sa wrote:

I will do an ls and it might take 5 minutes to retrieve a response.

Do you use any external name service (LDAP/NIS/…)? If so your problem
could be uid number to user name resolution taking too much time on so
many files.

You can check this by running ‘ls -n’, if it’s much faster you’ll have
to either:

  • speed up name resolution (use a cache like nscd - which should be
    installed or readily available on any Linux distribution - for example).
  • don’t resolve names when you don’t need them.

Lionel

Ma Sa wrote:

I know that ruby has better
performance…

lol.

Ma Sa wrote:

I will do an ls and it might take 5 minutes to retrieve a response.

Would this be faster if I were to do the same ls function using the ruby
api?

What OS are you running? Something seems like it’s not working. I run
FreeBSD 6.2 and I have almost 200,000 files in one folder on an NFS
mount and ls is pretty much an instant response. I doubt Ruby will be
any faster than a compiled C app (ls). Could be a firewall issue (pf and
nfs do not get along to well for example), locking issue, network
congestion or misconfiguration. You could try to see where the
bottleneck is by doing the following:

On the NFS server:

time ls > temp_file

Make sure the temp file is on the server too. This will tell you if the
bottleneck is with NFS/network or the server’s file system.

On the client (with temp_file on the server):

time ls
time cat temp_file

This will tell you if it is the ls command or the NFS/network time to
send the data over the network.

Copy the temp_file to the client and see what the speed is on the
client.

time cat temp_file

Here are my numbers for reference:

ls | wc -l
197621

time ls > /dev/null (local)
real 0m0.669s
user 0m0.550s
sys 0m0.114s

time ls > /dev/null (over nfs)
real 0m0.716s
user 0m0.606s
sys 0m0.070s

Dan

On 04.03.2008 02:10, Ma Sa wrote:

I will do an ls and it might take 5 minutes to retrieve a response.

Would this be faster if I were to do the same ls function using the ruby
api?

Did you try it out? What happened? My guess is that it’s as slow
because likely the timing is dominated by IO.

Kind regards

robert

On Mon, Mar 3, 2008 at 10:21 PM, Markus A. [email protected]
wrote:

However, sometimes it can take a very long time to list out the files
contained within a directory.

I will do an ls and it might take 5 minutes to retrieve a response.

Would this be faster if I were to do the same ls function using the ruby
api?

The only way to know for sure would be make some simple tests.

I must beg to differ. Ruby doesn’t stand a chance of being as fast as
your system’s ls.

ls does appear to be faster, but I’m sure others on this list know a
better Ruby way than that iterating over a Dir object.

Markus

The method you’re looking for is Dir.foreach(‘.’)

Daniel Brumbaugh K.

If you just need to list the name of the files, du is faster than ls
du -a should make it.

-r.

From experience:
Your reverse DNS isn’t setup properly.

On Wed, Mar 5, 2008 at 2:32 PM, Rodrigo B.
[email protected] wrote:

If you just need to list the name of the files, du is faster than ls
du -a should make it.

-r.

First of all, that’s using the wrong tool for the job. The job of ls
is to list files, du is for estimating file space usage. Second of
all, on my system at least, you’re wrong about speed:

$ time ls -1>/dev/null
real 0m0.013s
user 0m0.004s
sys 0m0.008s
$ time du -a>/dev/null
real 0m0.080s
user 0m0.004s
sys 0m0.020s

the -1 on ls is to prevent formatting of the results

Daniel Brumbaugh K.

7stud – wrote:

Ma Sa wrote:

I know that ruby has better
performance…

lol.

hey 7stud. i don’t know what you are laughing at. depending on the logic
of the script you can achieve much better performance with a ruby script
than a bash script.

Daniel Brumbaugh K. wrote:

On Wed, Mar 5, 2008 at 2:32 PM, Rodrigo B.
[email protected] wrote:

If you just need to list the name of the files, du is faster than ls
du -a should make it.

-r.

First of all, that’s using the wrong tool for the job. The job of ls
is to list files, du is for estimating file space usage. Second of
all, on my system at least, you’re wrong about speed:

$ time ls -1>/dev/null
real 0m0.013s
user 0m0.004s
sys 0m0.008s
$ time du -a>/dev/null
real 0m0.080s
user 0m0.004s
sys 0m0.020s

the -1 on ls is to prevent formatting of the results

Daniel Brumbaugh K.

Sorry I did not see all your files were on the same directory. . I
should meant “du is ‘better’ for recursive listing (friendly parsing
format)” . Not sure but I got the idea du was faster for recursive
listing.

D:\Users>time du -a
The system cannot accept the time entered.
Enter the new time:

Jeje …you got the idea.

On Tue, Mar 04, 2008 at 10:10:54AM +0900, Ma Sa wrote:

I have a directory with a large amount of files that is mounted over
nfs. The network is gigabit and so are the nic’s. The servers are very
high end so the hardware / network performance is not the bottleneck.
However, sometimes it can take a very long time to list out the files
contained within a directory.

We’ve found on our linux-based nfs server that increasing the number of
threads in the nfs process improves performance in some cases.

We’ve also found that avoiding nfs altogether whenever possible is
preferable. Switching to wget with a lightweight http server to
transfer large files instead of reading them directly off the nfs mount
was a big improvement. You could try starting a drb server on the nfs
server to feed out file listings and see if that improves performance.

Paul

Vidar H. wrote:

If that’s the problem, then switching filesystem or reorganizing the
file structure are your best bests - ext2fs is completely unsuitable
for solutions with large number of files per directory.

For ext2/3 filesystems created ages ago:

man tune2fs, see “-O” and “dir_index”

For recently created (probably less than 2 years ago), perf should be
good.

Last time I checked with big directory strucutres the performance
difference for file listing between xfs, ext3, reiserfs and jfs was
negligible.

Lionel

On Mar 4, 1:10 am, Ma Sa [email protected] wrote:

I will do an ls and it might take 5 minutes to retrieve a response.

Have you tried doing the same ls locally on the server? How many files
are in the directory you try to run ls on?

If it’s a huge number of files and it’s slow on the server too, then
the problem is likely that you use a filesystem (such as ext2) that
handles large directories poorly. If so, using “ls -f” should start
returning results faster, though it’ll still be slow. It’s a good
indicator of the source of the problem.

If that’s the problem, then switching filesystem or reorganizing the
file structure are your best bests - ext2fs is completely unsuitable
for solutions with large number of files per directory.

Second, “strace -c” is your friend. It’ll show you what syscalls takes
up the time. If it’s spending all it’s time in readdir() then the
problem is on the remote server or the network.

Third, “tcpdump” is your friend. Use it to see if the network traffic
to the NFS server is going at reasonable rates.

Ruby is very clearly not going to be faster than “ls” with the right
options - with “ls -f” for example, ls will mostly just do readdir()
calls and spit out the results. There aren’t really many ways of
making it much faster other than connecting to the remote server and
running “ls” directly on it.

Vidar