Forum: Ferret Using ferret as a base64-encoded numerical db

Posted by Kelly Jones (Guest)
on 2009-07-06 16:16
(Received via mailing list)
I'm using ferret to store random base64 strings of length 72 (courtesy
"dd if=/dev/random ... | mmencode"), with the long-term goal of
storing floating point/integral numbers (converted to
base64). Problems:

 % Ferret regards the base64 characters "+" and "/" as word
 separators, so a search for "content:[xji xjj]" yields things like
 "FqWu9uXM99HXZEJMl0Ux/jdOSP0+XJiL9v1ZDK24D0LMp60PUMPdhkbnFQykVMfilxecQFU6"
 where "xji" appears after a plus sign. How to avoid this? I could
 change "+" to "_", but I'm not sure changing "/" to "." or ":" or "-"
 or "!" would work.

 % Ferret's default search is case-insensitive, so I get things like
 "xJiQf0PEagWJME9Tf5pFu6dk4UGGFw5Lc0PIfa9N70Mb2IG2IWO36VCsC0y7Q1zOrLjk2Lz4",
 which match "xJi" but not "xji". How to fix?

 % When I do a range query, does ferret return *all* documents
 matching the query or only the highest scoring 10? For my purposes, I
 need *all* documents matching a query, not just the first few.

Is anyone else using ferret as a db? Since it's hash-based, it's much
faster at indexing large numbers of strings than sqlite3.

I realize I could just 0-pad my numbers (eg, "000005" for 5), but I've
got a LOT of data (400M pairs of floating point numbers), so I prefer
compactness.

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.