Advanced ActiveRecord Question

thesunny · December 24, 2006, 12:54am

First of all, I just want to apologize for the length of the post and to
thank all of you who take the time to get through it.

I am an experienced developer and have written an eCommerce website
builder that hosts tens of thousands of customers and we offer a
reseller program to hundreds of businesses; however, I am inexperienced
in Ruby and in Rails.

Through the books I’ve read on Ruby on Rails (and reading through
ActiveRecord source code), there appears to be no stock answer to this
question. It will probably require extending ActiveRecord, but I’m not
sure how I should do this without breaking the rest of ActiveRecord.

The question will sound familiar, but it’s not the same.

Here’s what I want to do:

I want to be able to store multiple fields into a single field. I’d
like to do it by adding a method call something like this during
initialization:

class User < ActiveRecord::Base
encoded_field :encoded_stuff {
age => “23”,
hair_color => “black”,
eye_color => “brown”
}
end

:encoded_stuff is a field in a table. The “hash” passed in as the second
argument to encoded_field is what I call the “prototype” because it is a
prototype of what the data inside :encoded_stuff should look like. The
keys in the hash will be converted into attributes in ActiveRecord.

When a new activeRecord object is created, it will have age,
hair_color and eye_color (the prototype keys) as parts of its attributes
and they should be accessible like regular attributes.

John = User.new
John.age → “23”
John.hair_color → “black”
John.hair_color = “blonde”
John.hair_color → “blonde”

Note that while these attributes can be accessed, there is technically
no such fields (e.g. age, hair_color and eye_color) in the database.

At save time, all those attributes get encoded into the field
“encoded_stuff” above. The encoding I have is proprietary but similar to
YAML, only very lightweight and fast to process. I’m not too worried
about the encoding code because it is simple to write or I might end up
using YAML anyways.

John.save

The above call encodes the attributes from the prototype (age,
hair_color and eye_color) into the db field “encoded_stuff” something
like this:

age23| hair_colorblack|
eye_color`blonde|

This is half the functionality I need.

At read time, the field “encoded_stuff” gets decoded back into
attributes.

John = User.find(1)
John.age → “23”
John.hair_color → “blonde”

However, if the value for an attribute is missing in the encoded data,
it will use the value provided in the “prototype”.

Right now, if we did this

John.has_glasses

Would probably return some sort of has_glasses method not found error.

But, let’s say we redeclared the prototype to add a “has_glasses”
attribute like this:

class User < ActiveRecord::Base
encoded_field :encoded_stuff {
age => “23”,
hair_color => “black”,
eye_color => “brown”,
has_glasses => “yes”
}
end

Now, let’s try it again

John = User.find(1)
John.has_glasses → “yes”

There is no “has_glasses” value encoded in the “encoded_stuff” field;
however, since it was in the prototype, the default value “yes” has been
added to it.

THE HARD PART?

I think the hard part is knowing where to inject the encoding/decoding.

I’m afraid if they are not added in the correct places, things like the
validation code in ActiveRecord would fail because the attributes would
get encoded into a single field. If this happened before the validation,
there would be problems because the attributes that the validation
checks against are simply missing.

For example, if you were checking the eye_color was not “yellow”, but
the encoded already happened, there would be no “eye_color” attribute to
check against.

WHY DO THIS?

I’m sure a lot of people will chime in with other solutions, some of
which may be valid, so I wanted to explain why I need the ability to add
an encoded_field with a prototype.

When my application was small, adding fields to the database was easy.
You just go into the schema, and add the new field that you need. As the
tables got bigger, it became impossible to make changes to the database
without having long periods of downtime. As I anticipate the record
counts to go into the hundreds of millions, adjusting the table could
take a long time (minutes to hours).

By encoding the fields, I can change the application by simply changing
the prototype of the encoded_field. The application would not break. In
fact, the data wouldn’t even need to be refactored at all.

Basically it means we can keep modifying our application in an agile
manner without having to schedule large 3:00 a.m. downtimes which are
scary and can be dangerous.

I already have this set up under our current development environment and
it works great.

I know I can store serialized Objects using YAML but it doesn’t solve
problems like when we need to add more attributes. Furthermore, I want
the storage to be transparent in that one day, I may decide that “age”
is an important enough field that I want to actually have it be part of
the database schema. Under the proposed method above, I can do this
without having to change any of the code that interacts with
ActiveRecord because the attributes will look identical, only the
storage would be different.

Under my application, I anticipate having dozens of attributes encoded
into a single field and they could change quite regularly.

I know this may cause some unsolvable issues like table joins, but since
I probably wouldn’t do table joins on encoded data anyways, I think many
of these issues could be ignored.

Thanks in advance for any help.

Sunny H.
CEO, MeZine.com Inc.

thesunny · December 24, 2006, 2:26am

Hi Sunny,

On Sun, 2006-24-12 at 00:54 +0100, Sunny H. wrote:

eye_color => "brown"

}
end

Things like this are done in Rails all the time. The prime example is
storing passwords in a database. Typically a User model would have a
method called password= (a setter method) to set the password.

However, in the database itself, there is typically no password field.
There would be a salt and an MD5 hash of the password. The password=
method would update the salt and MD5 hash fields in the database. They
would be reference via self.salt and self.hash (or whatever the fields
are called in the database)

The model methods can, but do not necessarily, have to match exactly
what’s in the database. The model is an abstraction on top of the
database model.

Let’s suppose the encoded_field in the physical db is called
“encoded_data”. There will be a method in the User model called
encoded_data.

However, in the User model itself, you can also created methods called,
for instance:

age and age= (getter and setter that access the self.encoded_data method
to do its magic). Ditto for the hair_color / hair_color= and
eye_color / eye_color= methods.

You can then just add or remove methods from the model as needed.

As you get more comfortable with the dynamic nature of ruby, you’ll find
that you’ll be able to (possibly) create these getter and setter methods
on the fly as well (possibly based on another table in the
database … But, that’s for another day.

–
Rick
[email protected]

thesunny · December 24, 2006, 3:52am

The new edition of Agile has a good explanation of this with
password-based example.

thesunny · December 24, 2006, 6:26am

counts to go into the hundreds of millions, adjusting the table could
take a long time (minutes to hours).

This sounds like a bad idea to me. The reason for having a database is
to avoid doing things like this - you loose your ability to query these
field. You can design a structure which doesn’t require you add new
table columns which would fit this example. For example having a table
“body_attributes” which may have entries like (1, ‘eye color’) or (2,
‘has hair’) and then a table called, say, ‘body_attribute_values’ with
entries like (unique_id, 1, “brown”) and (unique_id, 2, “no”) where the
second column is a reference to the id column in “body_attributes” (you
could be more complex with the design and allow for
body_attribute_values of different types other than strings, like
boolean.)

But if you still want to do it: add a class method to
ActiveRecord::Base which is this “enocode_fields” method, taking, as
you described a hash of default values. For each of those values use
the ruby function define_method to add methods (like you described) to
the model.

Ryan

thesunny · December 30, 2006, 9:25pm

Hi All,

Thanks for the replies and sorry for the very late response.

I think I might have a solution.

As Rick and Ryan suggested, I like the idea of rewriting the getters and
setters but have them update a hash that is in the encoded_field
attribute. That would solve half the problem.

The other half I could encode the fields into a string before
insert/update with observers and decode the fields from a string back
into a hash during select. Originally I felt that adding new attributes
then converting it to a string in the encoded_field would work (this is
how I do it in my current framework) but it seems that it is simpler to
achieve this by writing new getters/setters for ActiveRecord that set a
hash and then has the hash set the string. Different approach to the
same end.

This sounds like a bad idea to me. The reason for having a database is
to avoid doing things like this - you loose your ability to query these
field. You can design a structure which doesn’t require you add new
table columns which would fit this example. For example having a table
“body_attributes” which may have entries like (1, ‘eye color’) or (2,
‘has hair’) and then a table called, say, ‘body_attribute_values’ with
entries like (unique_id, 1, “brown”) and (unique_id, 2, “no”) where the
second column is a reference to the id column in “body_attributes” (you
could be more complex with the design and allow for
body_attribute_values of different types other than strings, like
boolean.)

But if you still want to do it: add a class method to
ActiveRecord::Base which is this “enocode_fields” method, taking, as
you described a hash of default values. For each of those values use
the ruby function define_method to add methods (like you described) to
the model.

Ryan

Hi Ryan, I understand your sentiment but in the situation for which I’m
using this, I believe it is the best solution. I used the example of a
“user” because it is easier to explain but it will most often be used
for program settings, layout options, etc.

I do lose “searchability” but on fields that I will never search.

In return, I get

Better performance (1 db lookup instead of 2 and 1 record returned
instead of several)
Simplicity (no table joins, no problems with changing an attribute
that has to be checked for existance then doing an insert or update)
Easy extendibility by extending a prototype (I can add new fields by
adding a key/value to a hash and don’t have to worry about the db. I can
also remove fields in the same way.)

And if I ever need it, I can still easily refactor to a proper field in
the database.

Thanks also Paul for the suggestion. Actually, I’ve already read that
book but it wasn’t quite enough to solve this issue.

Sunny H.
CEO, MeZine Inc.