Product Search Engine Design with Sphinx and Faceted Search

Hi, this is my first question on this forum! Please, take some time to
read this message, since it is very big. (sorry)

I’m building a search engine (crawler) that indexes products data from
more than 500 online brazilian stores. This part is easy => Crawling,
Extracting Information, etc.

I’m running on an app design problem. Different types of products have
different attributes. For instance: Books have :Publisher, :Edition,
:Author, etc… Digital Cameras have :Brand, :Megapixel, etc… and so
on. I REFUSE MYSELF TO CREATE ONE MODEL FOR EVERY TYPE OF PRODUCT.

The crawler automatically discover the types of products and product
attributes per type of product. What I was thinking is to have only one
model => Product. Please see below the design I want to have (if you
have any suggestions, please, tell me).

class Category < ActiveRecord::Base
end

class Product < ActiveRecord::Base
belongs_to :category
end

class AttributeType < ActiveRecord::Base
belongs_to :category
end

class Attribute < ActiveRecord::Base
belongs_to :product
belongs_to :attribute_type
end

The Category model represents the type of product (Books, Digital
Camera) and each category has an Attribute Type set. My application is
just like http://www.pricejunkie.com but specially for brazilian
customers. The customer will be presented with a search field (Sphinx
with Thinking Sphinx or UltraSphinx) and a list of categories.

What exactly is my problem?? Thinking Sphinx FACETS (filters). In this
applications, each facet is an Attribute Type. Please see
http://www.pricejunkie.com, click on the Book category and you will see
what I mean.

The problem is that facets in Thinking Sphinx are defined per model
field. If I had a Book model with the field :autor, I could just do this
with Thinking Sphinx on the Book model:

define_index do
indexes author, :facet => true
end

I need my system to have dynamic facets per product category, otherwise
I will have to create more than 300 different product types…

I hope I made myself clear. PLEASE SOMEONE HELP.
Thanks

I think there may be something wrong with your model. Firstly I think
you
need
Category
has_many :products
has_many :attribute_types

Product
belongs_to :category
has_many :attributes

AttributeType
belongs_to :category
has_many :attributes

Attribute
belongs_to :product
belongs_to :attribute_type

Is there a problem with the above? If one has a product then
product.category.attribute types will give one a collection of attribute
types relevant to that model. One could also use product.attributes to
get
a collection of attributes, but since each attribute has an associated
attribute type one can also get the attribute types for a product by
this
route. I am no database expert, so I may be wrong, but I suspect that
it is
not a good idea to have two separate relationship routes between models
like
this.

My other problem is that I am not sure what you are trying to achieve.
Can
you explain in a couple of sentences what data you wish to extract when
the
user clicks on the Book category for example?

Colin

2009/5/25 Daniel J. [email protected]

Hi thank you for your answer

The models you wrote are ok, I just simplified them for illustration
purposes.

My problem is with search facets (filtering) using Thinking Sphinx.
Suppose that I had a book model with the fields author, publisher and
year. When a customer select the book category, he will be presented
with a list of books and on the left side, he will have the “facets”
like that:

authors
-author 1 (203)
-author 2 (125)
-author 3 (99)
-author 4 (38)

publishers
-publisher1 (199)
-publisher1 (21)
-publisher1 (408)
-publisher1 (134)

years
-2009 (109)
-2008 (33)
-2007 (12)
-2006 (500)

This would be very easy if I had a book model. Thinking Sphinx requires
a facet to be defined like this on the “book” model:

define_index do
indexes author, :facet => true
indexes publisher, :facet => true
indexes year, :facet => true
end

THE PROBLEM IS THAT I DO NOT HAVE THIS MODEL AND THIS ATTRIBUTES
DEFINED. I don’t have this defined because I would have to know all the
product types and all the possible attributes for each product type and
create a different model for each product type.


WHAT IM LOOKING FOR

I’M LOOKING FOR SOME KIND OF METAPROGRAMMING THAT ENABLES ME TO FIND
WHAT ARE THE ATTRIBUTE TYPES FOR A CATEGORY OF PRODUCTS AND BUILD THE
FACETS AT RUNTIME. SOMETHING LIKE THIS IN MY PRODUCT MODEL:

define_index do
find whatever attribute types are relevant for this category
and makes it :facet => true
end

THANKS!!

On Monday 25 May 2009, Daniel J. wrote:

types…
I have never used Thinking Sphinx or even plain Sphinx, still, here are
my 2¢:

It seems to me that your problem is that you have a meta-model toolkit
on the one hand, your Category, Product, AttributeType, and Attribute
classes. While on the other hand, you want to define concrete facets
such as author, brand, etc. on top of that. That may be possible, but
probably it is not.

I’ve had a quick look at the code for Thinking Sphinx and you’ll
probably be able to find what you need if you start studying it closely
starting from its ActiveRecord integration. I don’t think you will be
able to achieve your goal without understanding details of how Thinking
Sphinx works, as you’re trying to stretch it beyond its intended use.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

I understand. Sorry, I do not know enough about how the inner workings
of
Sphinx to be able to help.

Can anyone else help?

Colin

2009/5/25 Daniel J. [email protected]