Announcement: Indexed Search Engine 0.1.2 Available

lanceball · January 3, 2006, 9:46pm

Hello all.

Apologies… I was a little too eager in my earlier annoucement about
the Indexed Search Engine for Rails apps. The DB migration file
contained an error that had to be worked around. I’ve fixed that,
added more (and clearer) documentation, and a sample application. You
can find most everything you want to know about Indexed Search Engine
here:

http://langwell-ball.com/indexed-search/

Indexed Search is a simple, pluggable engine for rails applications
which can be used to enable full text indexed searches within an
application. Searchable data is parsed, stemmed using the Porter
stemmer, and added to a fully indexed table. This allows you to index
things like “he runs fast” which will be returned from a search for
“running”.

This message has been cross-posted to the Engine Developers, Engine
Users, and Rails mailing lists.

Best
Lance B.

lanceball · January 3, 2006, 10:19pm

On 04/01/2006, at 7:45 AM, Lance B. wrote:

http://langwell-ball.com/indexed-search/

Indexed Search is a simple, pluggable engine for rails applications
which can be used to enable full text indexed searches within an
application. Searchable data is parsed, stemmed using the Porter
stemmer, and added to a fully indexed table. This allows you to index
things like “he runs fast” which will be returned from a search for
“running”.

I see in the API docs it says to make the index calls from the
controller. Would it not be better to do it from an ActiveRecord
Observer?

– tim

lanceball · January 3, 2006, 10:22pm

How does it compare to ferret ? Just from the README it seems to be much
easier to setup and to use than ferret, but not as fast. If anybody has
experience with both, would be interesting to hear.

lanceball · January 3, 2006, 10:34pm

On 1/3/06, Tim L. [email protected] wrote:

I see in the API docs it says to make the index calls from the
controller. Would it not be better to do it from an ActiveRecord
Observer?

Hi Tim

Good question. I thought about that - and in fact, in my first pass
at this it’s what I did. The problem I have with that approach is
that you then have to have knowledge of URIs in the ActiveRecord
classes, and that seemed to break the MVC paradigm.

It was also problematic when a single view had several different
active records in it. For example, a single view that contains course
and instructor records for a university administration system may get
that data from two different active record types. If you put the
calls to the indexer in the ActiveRecord classes, you then have to
make multiple calls to the indexer (one from each type). The
controller is typically aware of both anyway, so it seems to make more
sense there.

When content is indexed, the indexer wants the content, the title, and
a URI to access the content, supplied via IndexableRecord::IndexData
(http://langwell-ball.com/distributions/indexed-search/doc/classes/IndexableRecord/IndexData.html).
To index multiple objects as a part of a single view, you can
concatenate the content from each active record, but still only
provide a single URI and title.

Of course, there’s nothing stopping you from doing it with an
observer. The API doens’t care if it’s called from the controller or
an active record. Just add “include IndexedSearchEngine” to your
class. Having the calls to the controller is just the convention that
I settled on for the reasons noted above.

Lance

lanceball · January 3, 2006, 10:41pm

On 1/3/06, Roberto S. [email protected] wrote:

How does it compare to ferret ? Just from the README it seems to be much
easier to setup and to use than ferret, but not as fast. If anybody has
experience with both, would be interesting to hear.

Ferret is almost certainly faster. I believe IndexedSearchEngine is
easier to use, however.

I wrote it because I was doing a quick demo app for work and I didn’t
want to deal with setting up and learning Ferret. I too would be
interested in hearing about others’ experience with both.

Lance

lanceball · January 4, 2006, 6:12pm

I haven’t tried either of the Ruby-based index/search engines, but after
a bunch of testing, including big names like Lucene and Swish-e, I found
one that works well for me. I have a collection of something like 500
megabytes of PDF files (including the Pickaxe book and AWDWR ).

I’ve settled on Namazu. It’s fully-configured out of the box for
handling PDFs and Word documents. IIRC it’s mostly in Perl, so it could
probably be ported to Ruby fairly easily. And it can handle Japanese;
that was the author’s original motivation for writing it, I think – a
lack of usable Japanese-language search engines.

Lance B. wrote:

easier to use, however.

–
M. Edward (Ed) Borasky