Problems with large object sets?

Im having trouble with an nginx setup built to serve search engines.

Based on the user agent, all bots are served only from cache. We
populate the cache with our own set of spiders so we can control the
overall load.

Total cache size is ~450 GB in ~12 million files.

The problem is that about 1/3 of the requests coming in live from the
bots are misses, even though the requested page was requested by our
spider a mere hour previously.

Configured limits should be safe:

proxy_cache_path /var/www/cache levels=1:2 keys_zone=my-cache:2500m
max_size=800000m inactive=800h;

Where should I be looking for why these requests were misses?

Thanks,

David