One directory with 500 000 files (5-8mb each)

stasbz · June 21, 2008, 1:18pm

Please, Could you tell me.
I will have server with nginx (OS centos +php) and one directory with
500 000 files (5-8mb each). Nginx would give files to other web server.
Would i have problem in my case?

stasbz · June 21, 2008, 6:57pm

In my experience, ext3 does not handle 500,000 files in one directory
very
well. This is not nginx but instead a limitation of the filesystem. I am
not
sure if there is a filesystem you could use which handles this problem
well.
Regardless, I would recommend you break up the files into
subdirectories.
You can segment files by filename or, if the files are number-based,
numerically. For instance, say you had stock symbols:
aapl.html
csco.html
etc…

Put each in a directory based on its first letter then second letter,
e.g.:

/a/a/aapl.html
/c/s/csco.html

JD / wuputah

stasbz · June 21, 2008, 10:06pm

Jonathan D. wrote:

In my experience, ext3 does not handle 500,000 files in one directory
very
well. This is not nginx but instead a limitation of the filesystem. I am
not
sure if there is a filesystem you could use which handles this problem
well.

What about raiser? does it handle 500,000 files in one directory well?

stasbz · June 22, 2008, 2:17am

On Sat, Jun 21, 2008 at 22:06:52, Ss Ss said…

In my experience, ext3 does not handle 500,000 files in one directory
very well.

What about raiser? does it handle 500,000 files in one directory well?

I’m sure one of the filesystems will do this better than others, but to
me,
this sounds like a poor application design rather than a filesystem
limitation.

No matter which filesystem you use, with that many directory entries,
it’s
going to be slow. I’d split them up. I’ve done the method using the
first
character of the filename, it works very well.

stasbz · June 22, 2008, 7:19am

On Sat, Jun 21, 2008 at 8:07 PM, Michael [email protected] wrote:

No matter which filesystem you use, with that many directory entries, it’s
going to be slow. I’d split them up. I’ve done the method using the first

How do you know it would be slow with any filesystem? Did you test
all filesystems with this special case to know for sure?

I am guessing ReiserFS could do quite well - it was designed to handle
huge numbers of files. What about others? XFS? Would be interesting to
know why so many files need to be in one directory as well…

stasbz · June 22, 2008, 8:10am

On Son 22.06.2008 01:08, Dan M wrote:

I am guessing ReiserFS could do quite well - it was designed to handle
huge numbers of files. What about others? XFS? Would be interesting to
know why so many files need to be in one directory as well…

maybe this post can help.

Aleks

stasbz · June 22, 2008, 9:52pm

Aleksandar L. wrote:

On Son 22.06.2008 01:08, Dan M wrote:

I am guessing ReiserFS could do quite well - it was designed to handle
huge numbers of files. What about others? XFS? Would be interesting to
know why so many files need to be in one directory as well…

maybe this post can help.

http://tservice.net.ru/~s0mbre/old/?section=projects&item=fs_contest2

Aleks

We can see that data reading perfomance doesnt depend of number of
folders. (but depends of number of files)
am I wright?

But if we need to open only 20% of all directories at a time. if we have
10 directory and need to open only two of them. So loading will be less

number of files will be less.
If we have on directory and open it once and read all files loading will
be higher.

stasbz · June 23, 2008, 4:25am

On Sun, Jun 22, 2008 at 7:31 PM, Rob M. [email protected] wrote:

As you can see, I’m a big reiserfs defender, it’s worked really well for us,
and most people who think it sucks usually have one of the following
problems.

The biggest issues that I know of when running ReiserFS are 1 -
misdesigned fsck - it’s architected as such if it comes to having to
run the fsck on Reiser for crash recovery the odds are you will end
up with garbage instead of restored files. Theodore Ts’o has a nice
write-up about that issue. I myself have had files produced by Reiser
fsck that contained garbage in them. Reiser fsck looks for what looks
like the bits of the B-tree on the FS, and discards the bits that it
doesn’t think belong to it.

2 - Its reliablity for e-mail systems is AFAIK uncertain. e-mail
systems must be run on filesystems that guarantee atomicity of
metadata updates. BSD’s UFS guarantees atomic metadata updates. EXT3
mounted in journal=ordered mode guarantees atomic metadata updates. If
Reiser doesn’t - then you are likely to lose email during a powerfail
or crash, also see above.

3 - ReiserFS’s fufture is uncertain due to Hans’ troubles.

Otherwise, I think it’s a great FS once you know what the weaknesses
are and use it with them in mind.

stasbz · June 24, 2008, 1:14am

The biggest issues that I know of when running ReiserFS are 1 -
misdesigned fsck - it’s architected as such if it comes to having to
run the fsck on Reiser for crash recovery the odds are you will end
up with garbage instead of restored files. Theodore Ts’o has a nice

The biggest problem occurs if you store loop back filesytem dumps within
a
file. When it does a fsck it scans all the data, so it’ll see what looks
like filesystem metadata, even though it really was within a file.

If you don’t do that though, fsck has worked fine the few times we’ve
had to
use it, albeit it is slow. In fact, it’s worked better in most cases
than
ext3’s fsck which often seems to lose which directory files were in and
move
them to lost+found. That’s just our experiences though, YMMV.

2 - Its reliablity for e-mail systems is AFAIK uncertain. e-mail

And where do you get that belief from? reiserfs also has data=journal
and
data=ordered modes, which provide the same consistency guarantees as
other
filesystems with those modes. data=ordered has been the default mount
mode
for ages.

3 - ReiserFS’s fufture is uncertain due to Hans’ troubles.

reiser3 is stable, touched very little, and hasn’t had direct input from
Hans in years. reiser4 is a different matter though.

I think the biggest issue is that SUSE was using reiser3 as the default
filesystem, but dropped it. reiser4 seems to have been crawling for
years
never getting “stable” and I’m not convinced it every will. It also
fails to
solve the data integrity problem that’s becoming so prevalent with large
volumes these days. I think COW + checksummed filesystems like btrfs are
the
ones to keep an eye on for the future.

Rob

stasbz · June 23, 2008, 2:07am

I am guessing ReiserFS could do quite well - it was designed to handle
huge numbers of files. What about others? XFS? Would be interesting to
know why so many files need to be in one directory as well…

maybe this post can help.

http://tservice.net.ru/~s0mbre/old/?section=projects&item=fs_contest2

The problem with benchmarks, is when you compare different systems, but
you
only tune one of them and not the others.

They mounted reiserfs with only “noatime”. If they did any research at
all,
they would have found that something like
“noatime,nodiratime,notail,data=ordered” or
“noatime,nodiratime,notail,data=writeback” will give you much, much
(10x)
better performance. The default “tails” implementation is designed to
save
space, but trades off performance. I think that tradeoff is too large,
and
they should have defaulted to “notail”, but that’s all history.

Reiserfs really does handle large directories well. We run an email
system,
and had a user with > 1,000,000 emails in a folder, which means >
1,000,000
files in a directory, and there were no problems accessing individual
files
in that directory at all. That user has trimmed down to 100,000 or so
now,
so I’ll show that access is fine with that.

So it is hot in the cache now…

$ time ls | wc -l
161865

real 0m0.596s
user 0m0.500s
sys 0m0.110s

Most time is user time, not system time there.

Accessing a random file the first time.

$ strace -tt -o /tmp/st perl -e ‘open(my $F, “1005527.”); print scalar
<$F>;
close($F);’

…
19:15:24.463622 open(“1005527.”, O_RDONLY|O_LARGEFILE) = 3
19:15:24.463693 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8af7c8) = -1
ENOTTY (Inappropriate ioctl for device)
19:15:24.463755 _llseek(3, 0, [0], SEEK_CUR) = 0
19:15:24.463818 fstat64(3, {st_mode=S_IFREG|0600, st_size=3576, …}) =
0
19:15:24.463919 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
19:15:24.463998 read(3, “Return-Path: <192.168.10.239@xyz”…, 4096) =
3576
19:15:24.477487 write(1, “Return-Path: <192.168.10.239@xyz”…, 40) = 40
19:15:24.477588 _llseek(3, 40, [40], SEEK_SET) = 0
19:15:24.477649 _llseek(3, 0, [40], SEEK_CUR) = 0
19:15:24.477703 close(3) = 0
…

All the seeks + stats + fcntls are just perl doing various rubbish
around a
file. You can see there’s no big pauses in accessing the file, just the
0.01
seconds on the first read which seems reasonable on this already loaded
email server. Lets see a second run.

…
19:18:23.275604 open(“1005527.”, O_RDONLY|O_LARGEFILE) = 3
19:18:23.275681 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfc6ab88) = -1
ENOTTY (Inappropriate ioctl for device)
19:18:23.275744 _llseek(3, 0, [0], SEEK_CUR) = 0
19:18:23.275805 fstat64(3, {st_mode=S_IFREG|0600, st_size=3576, …}) =
0
19:18:23.275908 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
19:18:23.275988 read(3, “Return-Path: <192.168.10.239@xyz”…, 4096) =
3576
19:18:23.276090 write(1, “Return-Path: <192.168.10.239@xyz”…, 40) = 40
19:18:23.276189 _llseek(3, 40, [40], SEEK_SET) = 0
19:18:23.276250 _llseek(3, 0, [40], SEEK_CUR) = 0
19:18:23.276305 close(3) = 0

The file is hot in the cache, so the first read is about 0.0001 seconds
there.

As you can see, I’m a big reiserfs defender, it’s worked really well for
us,
and most people who think it sucks usually have one of the following
problems.

They use unreliable hardware. Reiserfs does not cope well in the face
of
unreliable hardware. If writes or reads return IO errors at any time, or
any
data corruption occurs on disk, reiserfs is much more likely to crash
because of the more complex b-tree structure and no checksums. I think
that’s why using it on user desktop/laptop machines is a big mistake. On
reliable server hardware though, it’s great.
They use LVM. From our testing, for some reason, there still seems to
still be strange LVM/reiserfs interactions. Use hardware RAID
They use some stupid mount options which cause shocking performance
They use some dumb filesystem speed test (eg untar + retar linux
kernel),
rather than using long term benchmarks that show how a real world
filesystem
performs after years of read/writing/creating/deleting files and
fragmentation.

Just my experience over 8+ years of trying filesystems in a server
environment.

Rob

PS. I’m looking forward to BTRFS becoming stable. Chris M. did a lot
of
work on reiserfs, and he knows the ins and outs of filesystem and linux
VM
development. He also has a good relationship with the rest of the kernel
team, so hopefully won’t suffer the Hans PR nightmare. Initial
benchmarks of
btrfs look very promising, and it’s being developed pretty quickly.
Definitely one to keep an eye on.

stasbz · June 24, 2008, 10:08pm

On Mon, Jun 23, 2008 at 7:03 PM, Rob M. [email protected] wrote:

The biggest problem occurs if you store loop back filesytem dumps within a

I didn’t store loopback fs dumps in a file, and still ended up with
garbage.

file. When it does a fsck it scans all the data, so it’ll see what looks
like filesystem metadata, even though it really was within a file.

This is why, again, that fsck is dumb.

2 - Its reliablity for e-mail systems is AFAIK uncertain. e-mail

And where do you get that belief from? reiserfs also has data=journal and
data=ordered modes, which provide the same consistency guarantees as other
filesystems with those modes. data=ordered has been the default mount mode
for ages.

Just because it has data=ordered mode, it does not mean that it has
UFS metadata update semantics - atomicity. I simply don’t know whether
it does or does not do the right thing, so at least to me that’s
uncertain. If fsync returns early before all data is safely written to
the disk, then you can lose email during powerfail or crash.

In any case this is an MTA and FS discussion, and not web nginx
discussion…