On Tue, 2006-08-01 at 18:59 +0900, David B. wrote:
Hmmm. Sounds like an interesting application. One solution would be to
cache the sort index on disk. The problem with this is that the cache
would still need to be recalculated every time you add more documents
to the index so you’ll still have the long wait occasionally. I’ll
look into it anyway at a later stage.
For my application this wouldn’t really be a problem since data is only
loaded maybe once a week. But does the cache need to be recalculated
completely? Database indexes work incrementally.
Another idea that I can implement now is to add a BYTES sort type
which would basically sort by the order the terms appear in the index.
Let’s say you index dates in the format “YYYYMMDD” and you sort by
INTEGER. Everytime you load the sort index you need to go through
every single date and convert it from string to integer. But this is
unnecessary since the dates are already in order in the index. A BYTES
sort type would take advantage of this.
For my date fields this would work.
You’d get an even bigger
benefit for ascii strings. strcoll is used to sort strings but this is
unnecessary for ascii strings as they are already correctly ordered in
the index. Also, the index needs to keep each string in memory which
would also be unnessary.
One of my text order fields should have nothing but ASCII. The other is
a title and can include arbitrary UTF-8, so I guess it wouldn’t work for
Sorry if this isn’t very clear. I’m not sure how much it will help.
We’ll have to wait and see.
The BYTES ordering would probably speed it up but for my specific case,
storing it on disk would be perfect. It would probably be a very good
thing in case someone uses ferret to code command line tools that access
a common index. Without storing the sorting on disk it will get
recreated every time a command is ran.