Excluding/Lazy loading BLOB fields from normal Rails ActiveR

Dear Rails enthusiasts,

I have database that has several blob fields. Some of them aren’t
exactly
photos and I really want to keep them in the database, but some of them
are
quite “huge” (several hundred Ks).

I can’t use http://www.kanthak.net/opensource/file_column/ plugin
because it
uses files and I want to store them in the DB.

The main (or the only) problem is performance.

Whenever I do a find, or SomeOtherModel.find( … :include =>
‘modelcontainingblob’)… The query will be soooooo slow as expected.
Not
much because of a lot of rows but mainly because of the unnecessary blob
columns loaded.

I’ve written an article about this is
gauldong.net,
the main solutions being

  1. move the blob fields to a separate table
  2. use views

After some time, it seems both approaches have complicated downsides
that I
can’t bear for a problem “so simple”.

There is this find :select but it seems that I have to use it
everywhere,
and it can’t be used by find :include (by other models) and I don’t
really
know what other methods use find.

What I want is that the default find for the model will exclude the BLOB
fields (or any fields I specify). And only load/save them if I
explicitly
request them (probably upon first access). This will mean an increase in
the
number of queries if I want to access all the BLOB fields on a
table/model,
but will drastically increase the performance of loading the whole
record/records.

I did do some research on Mr. Google and the rails groups and haven’t
got
satisfactory explanation.

Please help. Thank you.


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

‘modelcontainingblob’)… The query will be soooooo slow as expected. Not
can’t bear for a problem “so simple”.
record/records.
Hrm… seems that if you don’t want them loaded by default then a
separate
table is the way to go. If you’re worried about “table clutter” maybe a
mix of STI and polymorphism would help…

class Blob < ActiveRecord::Base
belongs_to :blobable, :polymorphic => true
end

class Image < Blob
end

class FlashVideo < Blob
end

… etc …

class MyThing < ActiveRecord::Base
has_one :image, :as => :blobable
end

class MyOtherThing < ActiveRecord::Base
has_one :image, :as => :blobable
has_one :flash_video, :as => :blobable
end

-philip

On 2/21/07, Philip H. [email protected] wrote:

end
class MyOtherThing < ActiveRecord::Base
has_one :image, :as => :blobable
has_one :flash_video, :as => :blobable
end

-philip

Thank you Philip :slight_smile: That’s a nice workaround indeed.

The blob table(s) would have to be having a lot of records, each doesn’t
have any intrinsic meaning as a “record”, or a blob record might have
multiple blob columns… that might introduce yet another performance
problem as all columns are loaded at once by ActiveRecord. Not to
mention
that the database become incohesive as now there isn’t a 1-1 association
between a model and a table (there are “dummy” blob tables). :frowning:

I’m looking for a more elegant solution like Propel’s (PHP) lazyLoad.
Propel
is a high-level DBA layer that I used when I was developing with PHP.
It’s
solution is the cleanest I’ve ever seen:

http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505

1http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L1

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

2http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L2

3http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L3
4http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L4

5http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L5
6http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L6

7http://propel.phpdb.org/trac/browser/branches/1.3/generator/projects/bookstore-packaged/media.schema.xml?rev=505#L7

8 <column 9 name="id" 10 required="true" 11 primaryKey="true" 12 autoIncrement="true" 13 type="INTEGER" 14 description="Media Id"/> 15 <column 16 name="cover_image" 17 type="BLOB" 18 lazyLoad="true" 19 description="The image of the book cover."/> 20 <column 21 name="excerpt" 22 type="CLOB" 23 lazyLoad="true" 24 description="An excerpt from the book."/> 25 <column 26 name="book_id" 27 required="true" 28 type="INTEGER" 29 description="Book ID for this media collection."/> 30 31 <reference 32 local="book_id" 33 foreign="id"/> 34 35
36 37 Alright it uses XML which we all used to hate, but the lazyLoad semantic is very useful here and I used it pretty often.


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

On 20-Feb-07, at 10:03 PM, Hendy I. wrote:

end

developing with PHP. It’s solution is the cleanest I’ve ever seen:

http://propel.phpdb.org/trac/browser/branches/1.3/generator/
projects/bookstore-packaged/media.schema.xml?rev=505

This example uses postgres views to exclude the fields to lazy load -
then goes back to the full table when pulling out the specific fields.

gauldong.net
blobs-in-active-record/

basically the same as Philip’s solution - but requires one table and
one view, postgres (and is pretty messy!)

As well there was talk about an AR extension to ignore certain
attributes - but I don’t know what happened to it - I’ve looked
around trac but couldn’t find it.

Cheers,
Jodi
General Partner
The nNovation Group inc.
www.nnovation.ca/blog


Dear Jodi,

On 2/21/07, Jodi S. [email protected] wrote:

This example uses postgres views to exclude the fields to lazy load - then
goes back to the full table when pulling out the specific fields.

gauldong.net

basically the same as Philip’s solution - but requires one table and one
view, postgres (and is pretty messy!)

Thanks Jodi. I know about that that. I wrote it. :slight_smile: I thought that was
good, but, it turns out the views just complicate things further than
separate tables. (views ARE good. updatable views + Rails ActiveRecord =
very uglily bad, at least in my experience)

As well there was talk about an AR extension to ignore certain
attributes -

but I don’t know what happened to it - I’ve looked around trac but couldn’t
find it.

Really? It does seem that nobody thought this is important. I think
using
the wrong database design pattern here… :frowning:

What is everybody using in this case? It seems most people just use
separate
tables (which is easiest, but cluttering the DB) and files (which is
much
more complex, and helped by existence of Mr. Kanthak’s file_column
plugin).

I don’t feel like writing a plugin just for this… especially not to
hack
ActiveRecord. :frowning: Maybe I’ll have to surrender to separate tables…


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

On 20-Feb-07, at 10:27 PM, Hendy I. wrote:

basically the same as Philip’s solution - but requires one table

tables…

lol. the web is so small.

how about find_by_sql?

or check out this ticket : http://dev.rubyonrails.org/ticket/2546

summary, using find(:select), then catch the MissingAttribute
exception to lazily load the attribute when object.blob is accessed.

I’d likely take that route.

Jodi

has_one :image, :as => :blobable
have any intrinsic meaning as a “record”, or a blob record might have
multiple blob columns… that might introduce yet another performance
problem as all columns are loaded at once by ActiveRecord. Not to mention
that the database become incohesive as now there isn’t a 1-1 association
between a model and a table (there are “dummy” blob tables). :frowning:

There’s only one extra table… “blobs”. Everything else is typed to
that table. And there’s no load issue until you do “mything.image”…

And has_many works exactly the same as has_one.

We have 40,000 rows in our “blob” table, and while we don’t store the
blogs in the database itself, we do use it to store all our various
types
of media… images, flash, wmv, rm, flv, icons, logos, etc… what’s
nice
is that for icons/logos those inherit from images, but semantically we
can
treat them different.

-philip

On 2/21/07, Philip H. [email protected] wrote:

is that for icons/logos those inherit from images, but semantically we can
treat them different.

-philip

Thanks Philip… By your experience in 40,000 rows, it seems that this
is
the way to go.

As you said, the typing will come in handy in some situations. There are
some gripes I feel about this concept, which is decohesiveness (the
actual
model and its blobs reside in separate tables) and coupling (the blob
table
contains lots of data that doesn’t have any meaning, and is coupled with
all
other tables).

What I wonder is how I’d do this if I have several blob fields in a
model
having the same “type”? Do I really have to create several different
[dummy]
classes for this in the blob table?

For example, if Product has “blob fields” left_image, top_image,
front_image, that means I have to define LeftImage, TopImage, and
FrontImage
classes just for the purpose of identifying them in the blob table? (so
each
product will have 3 records in blob table)

Or I just use a single Image class then use has_many? (so each product
will
have arbitrary number of records in blob table)

…or both approaches? (each product will have arbitrary blobs each with
its
own types)

I’m curious about your experience.


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

On 2/21/07, Jodi S. [email protected] wrote:

I’d likely take that route.
Jodi

Yes, Jodi… find :select is exactly what I want… except that it
doesn’t
come ‘by default’. (I wanted to put this :select into the model itself,
so
everybody using find without specifying columns explicitly, will use the
default columns I set that omits the BLOBs)

With find :include, matters get worse… since :include doesn’t allow
you to
specify anything on the “inner find”.

I really would love to have this approach, but for now, Rails seems to
force
me into hacking it when I have to use this approach. (having to
find&replace
manually all .find using :select, then consciously avoiding :include

dunno what else )

plugins, plugins, plugins… Where are you when I need one? :wink:


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

-philip

Thanks Philip… By your experience in 40,000 rows, it seems that this is
the way to go.

Keep in mind, we don’t store the blob data itself in the database…
just
meta-data about it (file size, url, width, height, duration, etc.)

As you said, the typing will come in handy in some situations. There are
some gripes I feel about this concept, which is decohesiveness (the actual
model and its blobs reside in separate tables) and coupling (the blob table
contains lots of data that doesn’t have any meaning, and is coupled with all
other tables).

What I wonder is how I’d do this if I have several blob fields in a model
having the same “type”? Do I really have to create several different [dummy]
classes for this in the blob table?

Depends on if you want to uniquely identify them.

For example, if Product has “blob fields” left_image, top_image,
front_image, that means I have to define LeftImage, TopImage, and FrontImage
classes just for the purpose of identifying them in the blob table? (so each
product will have 3 records in blob table)

I would create the 3 separate classes myself. That makes it easy to
refer
to them in your product views since you’re going to want one of them
specifically.

Or I just use a single Image class then use has_many? (so each product will
have arbitrary number of records in blob table)

Do this when your products have a series of screenshots (for example)
and
you don’t care what order they come back in.

I’m curious about your experience.

Do both, depending on the need. I recently did a project that had a
bunch
of articles that belonged to a category. The category had a Logo and an
Icon. The article had a PreferredImage and had many Images.

All of those classes were inherited from the Image class so shared in
the
common methods we have to manipulate images (like thumbnail generation).

-philip

Is there any talk of adding lazy loaded attributes to model objects? I
think this is a fundamental piece of functionality missing from
ActiveRecord that ORM frameworks like Hibernate and EJB3 have.

On Wednesday 21 February 2007, Scott wrote:

Is there any talk of adding lazy loaded attributes to model objects?
I think this is a fundamental piece of functionality missing from
ActiveRecord that ORM frameworks like Hibernate and EJB3 have.

If you think so, write a plugin that implements this. As a starting
point, I suggest you prototype this functionality by

  • using #find(:all, :select => ‘…’)
  • explicitly implementing accessors for the lazy attributes which load
    them when needed

Then, if you’re satisfied it’s actually worth it, look at
ActiveRecord::Base#construct_finder_sql. Making it work with included
associations will be a bit trickier, however. Nonetheless, I’m sure you
will find help in this group.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

On 2/21/07, Philip H. [email protected] wrote:

of media… images, flash, wmv, rm, flv, icons, logos, etc… what’s
the way to go.
all
For example, if Product has “blob fields” left_image, top_image,
Or I just use a single Image class then use has_many? (so each product
Icon. The article had a PreferredImage and had many Images.

All of those classes were inherited from the Image class so shared in the
common methods we have to manipulate images (like thumbnail generation).

-philip

Thank you Philip, your advices are so useful. I’m beginning to consider
the
merits of having separate tables.

Any plugins you’d like to suggest? (image- or blob- or polymorph-
related)


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

Any plugins you’d like to suggest? (image- or blob- or polymorph- related)

Nope. We’re not actually using any for this project.

In another project though I really like flex_image for it’s ability to
resize/dropshadow/watermark and then cache to disk…

-philip

On 2/22/07, Philip H. [email protected] wrote:

-philip

Thank you Philip. I’ll check that out. :slight_smile:


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

Hendy,

Regardless of the issue related to lazy loading of BLOBs have you hit or
seen this issue:

http://www.ruby-forum.com/topic/109387

I have not seen anyone with an answer to this. Thanks Mario

On 2/22/07, Philip H. [email protected] wrote:

-philip
Thank you Philip. I’ll check that out… :slight_smile:


Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour

Dear Mario,

On 5/26/07, Mario J. [email protected] wrote:

I’m less of a Rails ‘hacker’ lately.

Considering the rapid development of Rails and its plugins… I would
avoid
jumping the hoops just to force making the DBA happy.

100 MB BLOB column? Come on, just use the filesystem. Somebody has heard
of
S3.

It’s not a “solution” to the real problem. But it’s an alternative that
I
think should be considered with great care.

There are lots of ups about Rails’ ActiveRecord model, and that’s where
it
shines. But there are pitfalls that make us have to ‘abandon’ old
time-tested DB practices.

For example, rather than making a single tabel span tens of columns and
having multiple bloated ones… I now prefer narrow tables. And related
models just by using has_one relationship. Relationships are easy in
Rails.
Sometimes much easier than limiting the returned columns using ‘SELECT’!

With proper indexes and good DBMS I think this can result in more
maintainable app while not reducing performance.