How to structure my database

Hi!

I’m creating an app that will import products from several XML feeds. In
the XML there is a category specified, like T-Shirt for instance. The
problem is that different resellers specify the categories differently.
For
instance, what one reseller calls “T-Shirts” another may call “T-Shirt”,
a
third “short sleeved shirts” and so on.

I want to somehow map these categories to the categories I have myself.
So
I need some tips on how I should create my database.

The idea I have is to create a “raw_categories” table which contains the
name of the resellers category and a “category_id” which has a
belongs_to
relationship to my own “categories” table. Then when I import I simply
try
to find a raw_category which has a matching name and if there is one,
pick
it, otherwise add a new one. This new one I can then manually relate to
one
of my own categories.

Do you understand how I mean, and is it a good approach? Is there a
better/more efficient way?

  • If this is a good idea. How do I do it in Rails? Should I use
    something
    like this (I think I’ve seen something like this in the API doc):

products model

has_one :category, :through => :raw_categories

I estimate that there will be about 40k to 100k products in the
database.

Regards
Linus

On 18 November 2011 21:16, Linus P. [email protected]
wrote:

relationship to my own “categories” table. Then when I import I simply try
to find a raw_category which has a matching name and if there is one, pick
it, otherwise add a new one. This new one I can then manually relate to one
of my own categories.

Do you understand how I mean, and is it a good approach? Is there a
better/more efficient way?

  • If this is a good idea. How do I do it in Rails? Should I use something
    like this (I think I’ve seen something like this in the API doc):

products model

has_one :category, :through => :raw_categories

I think possibly you want product belongs_to raw_category and product
belongs_to category through raw_category. It has to be that way round
because you need raw_category has_many products.
Then you need category has_many raw_categories and raw_category
belongs_to product. You can also then say category has_many products
through raw_categories.

Are you sure you need this complexity however? You could just have
product belongs_to category (and the reverse) and work out which one
it is when you parse the xml, by looking it up in the raw_categories
table.

Colin

Colin

To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.


gplus.to/clanlaw

Thank you for the answer!

Yes, I was thinking about just having the reseller categories (raw
categories) as a lookup table. So for each product I import I just check
what category it should be in.
One issue with this is that it might get a bit messy if I would change
the
relations between a reseller category and a category. If I would change
this I have to go through all the products and update them accordingly.
If
I would use the suggestion in my original post it would work
automatically.

On the other hand, it might be very inefficient to do it like that… It
would save some database work with a direct relation between products
and
my categories.

I think I will try it out if no one else has any other suggestions?

Regards
Linus

On 19 November 2011 10:11, Linus P. [email protected]
wrote:

Thank you for the answer!
Yes, I was thinking about just having the reseller categories (raw
categories) as a lookup table. So for each product I import I just check
what category it should be in.
One issue with this is that it might get a bit messy if I would change the
relations between a reseller category and a category. If I would change this
I have to go through all the products and update them accordingly. If I
would use the suggestion in my original post it would work automatically.

Yes, if there is the slightest possibility of wanting to do this then
that is the way to go.

On the other hand, it might be very inefficient to do it like that… It
would save some database work with a direct relation between products and my
categories.

Don’t worry about efficiency. I can virtually guarantee that the
critical issues for performance will not be those you imagine at the
start of a project. You can worry about that much later in the
unlikely event that you need to.

Colin