Forum: Ruby on Rails Avoiding ext3's 4K entry limit when caching

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
81194a50c0f9bd95d7832a77fdf371bd?d=identicon&s=25 cool_screen_name90001 (Guest)
on 2005-11-16 02:18
(Received via mailing list)
I've just started checking out caching. I have
thousands of items with URLs like '/item/view/ID'. I
see caching puts them in public/item/view/ID.html. I'm
using Linux's ext3 filesystem and I've run into
problems before with caching and ext3's 4K entries per
directory limit. How can I avoid this?

thanks
csn



__________________________________
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com
39b36b2be47228f8619d61ea7a607a25?d=identicon&s=25 mixonic (Guest)
on 2005-11-16 02:27
(Received via mailing list)
Wow, this is an excellent question.

ext3's performance with super large directories can actually be pretty
decent with dir_index (it's depressingly bad without it).  You could
alter ext3 itself to allow more entries, but honestly, with that many
files in a directory you probably spend more time seeking the directory
for the file than you would spend generating the page dynamically.  You
could harvest the oldest files in the directory every few minutes, or
whatever seems appropriate, with a scheduled job.

It'd be very interesting to see how other large sites have handled this.
I'd push for either the cron, or rethinking what you cache (ie, don't
cache whole pages, just cache parts that repeat).  With more than a few
hundred files in a directory you'll loose alot of performance no matter
what filesystem you use.

Looking forward to hearing from some more experienced deployment folk,

-Matt B
27c57aaa4bda5ac8b0593659573b522f?d=identicon&s=25 blair (Guest)
on 2005-11-16 20:00
(Received via mailing list)
Matthew Beale wrote:
> Wow, this is an excellent question.
>
> ext3's performance with super large directories can actually be pretty
> decent with dir_index (it's depressingly bad without it).  You could
> alter ext3 itself to allow more entries, but honestly, with that many
> files in a directory you probably spend more time seeking the directory
> for the file than you would spend generating the page dynamically.  You
> could harvest the oldest files in the directory every few minutes, or
> whatever seems appropriate, with a scheduled job.

Matthew,

What are the performance characteristics of ext3 filesystems with and
without
dir_index for small directories up to large ones?

How many files do you need in a directory before dir_index is worth it?

Right now all my filesystems do not have dir_index enabled, so it would
require
some downtime to enable it.

Regards,
Blair
39b36b2be47228f8619d61ea7a607a25?d=identicon&s=25 mixonic (Guest)
on 2005-11-17 06:03
(Received via mailing list)
On Wed, 2005-11-16 at 11:00 -0800, Blair Zajac wrote:
>
> Matthew,
>
> What are the performance characteristics of ext3 filesystems with and without
> dir_index for small directories up to large ones?

Not sure on any benchmarks.  I wouldn't say "stunning" is a bad word to
use.  Basically, instead of using lists for files it uses B-Trees, which
are the same tech that make reiserfs directories so damn fast.

dir_index -
Use hashed b-trees to speed up lookups in large  directories.

> How many files do you need in a directory before dir_index is worth it?

I don't know.  But if you have 4000 I'd say that's a good place to
start :)

> Right now all my filesystems do not have dir_index enabled, so it would require
> some downtime to enable it.

Yeah, that's the crappy part.  In theory, you can enable it with
tune2fs, but in practice I've only gotten it with mke2fs.  My tools when
implementing it were mostly from older debian distro though, this was
unheard of stuff when they were written.

gl! Let us know how you fare.

-Matthew Beale
This topic is locked and can not be replied to.