Using ferret as a base64-encoded numerical db

I’m using ferret to store random base64 strings of length 72 (courtesy
“dd if=/dev/random … | mmencode”), with the long-term goal of
storing floating point/integral numbers (converted to
base64). Problems:

% Ferret regards the base64 characters “+” and “/” as word
separators, so a search for “content:[xji xjj]” yields things like
“FqWu9uXM99HXZEJMl0Ux/jdOSP0+XJiL9v1ZDK24D0LMp60PUMPdhkbnFQykVMfilxecQFU6”
where “xji” appears after a plus sign. How to avoid this? I could
change “+” to “_”, but I’m not sure changing “/” to “.” or “:” or “-”
or “!” would work.

% Ferret’s default search is case-insensitive, so I get things like
“xJiQf0PEagWJME9Tf5pFu6dk4UGGFw5Lc0PIfa9N70Mb2IG2IWO36VCsC0y7Q1zOrLjk2Lz4”,
which match “xJi” but not “xji”. How to fix?

% When I do a range query, does ferret return all documents
matching the query or only the highest scoring 10? For my purposes, I
need all documents matching a query, not just the first few.

Is anyone else using ferret as a db? Since it’s hash-based, it’s much
faster at indexing large numbers of strings than sqlite3.

I realize I could just 0-pad my numbers (eg, “000005” for 5), but I’ve
got a LOT of data (400M pairs of floating point numbers), so I prefer
compactness.


We’re just a Bunch Of Regular Guys, a collective group that’s trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs