Forum: Ruby on Rails Excluding/Lazy loading BLOB fields from normal Rails ActiveR

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Hendy I. (Guest)
on 2007-02-21 04:07
(Received via mailing list)
Dear Rails enthusiasts,

I have database that has several blob fields. Some of them aren't
exactly
photos and I really want to keep them in the database, but some of them
are
quite "huge" (several hundred Ks).

I can't use http://www.kanthak.net/opensource/file_column/ plugin
because it
uses files and I want to store them in the DB.

The main (or the only) problem is performance.

Whenever I do a find, or SomeOtherModel.find( ....... :include =>
'modelcontainingblob')... The query will be soooooo slow as expected.
Not
much because of a lot of rows but mainly because of the unnecessary blob
columns loaded.

I've written an article about this is
http://web2withrubyonrails.gauldong.net/2007/01/18...,
the main solutions being
1. move the blob fields to a separate table
2. use views

After some time, it seems both approaches have complicated downsides
that I
can't bear for a problem "so simple".

There is this find :select but it seems that I have to use it
everywhere,
and it can't be used by find :include (by other models) and I don't
really
know what other methods use find.

What I want is that the default find for the model will exclude the BLOB
fields (or any fields I specify). And only load/save them if I
explicitly
request them (probably upon first access). This will mean an increase in
the
number of queries if I want to access all the BLOB fields on a
table/model,
but will drastically increase the performance of loading the whole
record/records.

I did do some research on Mr. Google and the rails groups and haven't
got
satisfactory explanation.

Please help. Thank you.

--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Philip H. (Guest)
on 2007-02-21 04:25
(Received via mailing list)
> 'modelcontainingblob')... The query will be soooooo slow as expected. Not
> can't bear for a problem "so simple".
> record/records.
Hrm.. seems that if you don't want them loaded by default then a
separate
table is the way to go.  If you're worried about "table clutter" maybe a
mix of STI and polymorphism would help...

class Blob < ActiveRecord::Base
   belongs_to :blobable, :polymorphic => true
end

class Image < Blob
end

class FlashVideo < Blob
end

# ... etc ...

class MyThing < ActiveRecord::Base
   has_one :image, :as => :blobable
end

class MyOtherThing < ActiveRecord::Base
   has_one :image, :as => :blobable
   has_one :flash_video, :as => :blobable
end


-philip
Hendy I. (Guest)
on 2007-02-21 05:04
(Received via mailing list)
On 2/21/07, Philip H. <removed_email_address@domain.invalid> wrote:
> end
> class MyOtherThing < ActiveRecord::Base
>    has_one :image, :as => :blobable
>    has_one :flash_video, :as => :blobable
> end
>
>
> -philip


 Thank you Philip :-) That's a nice workaround indeed.

The blob table(s) would have to be having a lot of records, each doesn't
have any intrinsic meaning as a "record", or a blob record might have
multiple blob columns... that might introduce yet another performance
problem as all columns are loaded at once by ActiveRecord. Not to
mention
that the database become incohesive as now there isn't a 1-1 association
between a model and a table (there are "dummy" blob tables). :-(

I'm looking for a more elegant solution like Propel's (PHP) lazyLoad.
Propel
is a high-level DBA layer that I used when I was developing with PHP.
It's
solution is the cleanest I've ever seen:

http://propel.phpdb.org/trac/browser/branches/1.3/...

1<http://propel.phpdb.org/trac/browser/branches/1.3/...
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
2<http://propel.phpdb.org/trac/browser/branches/1.3/...
<!DOCTYPE database SYSTEM "../dtd/database.dtd">
3<http://propel.phpdb.org/trac/browser/branches/1.3/...
4<http://propel.phpdb.org/trac/browser/branches/1.3/...
<database package="core.book" name="bookstore"
defaultIdMethod="native">
5<http://propel.phpdb.org/trac/browser/branches/1.3/...
6<http://propel.phpdb.org/trac/browser/branches/1.3/...
<!-- This table exists to explore/test the handling of BLOB/CLOB
objects.
-->
7<http://propel.phpdb.org/trac/browser/branches/1.3/...
<table name="media">
8<http://propel.phpdb.org/trac/browser/branches/1.3/...
 <column
9<http://propel.phpdb.org/trac/browser/branches/1.3/...
   name="id"
10<http://propel.phpdb.org/trac/browser/branches/1.3/...
   required="true"
11<http://propel.phpdb.org/trac/browser/branches/1.3/...
   primaryKey="true"
12<http://propel.phpdb.org/trac/browser/branches/1.3/...
       autoIncrement="true"
13<http://propel.phpdb.org/trac/browser/branches/1.3/...
   type="INTEGER"
14<http://propel.phpdb.org/trac/browser/branches/1.3/...
   description="Media Id"/>
15<http://propel.phpdb.org/trac/browser/branches/1.3/...
  <column
16<http://propel.phpdb.org/trac/browser/branches/1.3/...
   name="cover_image"
17<http://propel.phpdb.org/trac/browser/branches/1.3/...
   type="BLOB"
18<http://propel.phpdb.org/trac/browser/branches/1.3/...
   lazyLoad="true"
19<http://propel.phpdb.org/trac/browser/branches/1.3/...
   description="The image of the book cover."/>
20<http://propel.phpdb.org/trac/browser/branches/1.3/...
      <column
21<http://propel.phpdb.org/trac/browser/branches/1.3/...
   name="excerpt"
22<http://propel.phpdb.org/trac/browser/branches/1.3/...
   type="CLOB"
23<http://propel.phpdb.org/trac/browser/branches/1.3/...
   lazyLoad="true"
24<http://propel.phpdb.org/trac/browser/branches/1.3/...
   description="An excerpt from the book."/>
25<http://propel.phpdb.org/trac/browser/branches/1.3/...
      <column
26<http://propel.phpdb.org/trac/browser/branches/1.3/...
       name="book_id"
27<http://propel.phpdb.org/trac/browser/branches/1.3/...
       required="true"
28<http://propel.phpdb.org/trac/browser/branches/1.3/...
       type="INTEGER"
29<http://propel.phpdb.org/trac/browser/branches/1.3/...
       description="Book ID for this media collection."/>
30<http://propel.phpdb.org/trac/browser/branches/1.3/...
      <foreign-key foreignTable="book" onDelete="CASCADE">
31<http://propel.phpdb.org/trac/browser/branches/1.3/...
    <reference
32<http://propel.phpdb.org/trac/browser/branches/1.3/...
     local="book_id"
33<http://propel.phpdb.org/trac/browser/branches/1.3/...
     foreign="id"/>
34<http://propel.phpdb.org/trac/browser/branches/1.3/...
  </foreign-key>
35<http://propel.phpdb.org/trac/browser/branches/1.3/...
</table>
36<http://propel.phpdb.org/trac/browser/branches/1.3/...
37<http://propel.phpdb.org/trac/browser/branches/1.3/...
</database>
Alright it uses XML which we all used to hate, but the lazyLoad semantic
is
very useful here and I used it pretty often.


--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Jodi S. (Guest)
on 2007-02-21 05:19
(Received via mailing list)
On 20-Feb-07, at 10:03 PM, Hendy I. wrote:

>
> end
>
> developing with PHP. It's solution is the cleanest I've ever seen:
>
> http://propel.phpdb.org/trac/browser/branches/1.3/generator/
> projects/bookstore-packaged/media.schema.xml?rev=505
>

This example uses postgres views to exclude the fields to lazy load -
then goes back to the full table when pulling out the specific fields.

   http://web2withrubyonrails.gauldong.net/2007/01/18...
blobs-in-active-record/

basically the same as Philip's solution - but requires one table and
one view, postgres (and is pretty messy!)

As well there was talk about an AR extension to ignore certain
attributes - but I don't know what happened to it - I've looked
around trac but couldn't find it.

Cheers,
Jodi
General Partner
The nNovation Group inc.
www.nnovation.ca/blog

Hendy I. (Guest)
on 2007-02-21 05:28
(Received via mailing list)
Dear Jodi,

On 2/21/07, Jodi S. <removed_email_address@domain.invalid> wrote:
>
> This example uses postgres views to exclude the fields to lazy load - then
> goes back to the full table when pulling out the specific fields.
>
> http://web2withrubyonrails.gauldong.net/2007/01/18...
>
> basically the same as Philip's solution - but requires one table and one
> view, postgres (and is pretty messy!)
>

Thanks Jodi. I know about that that. I wrote it. :-) I thought that was
good, but, it turns out the views just complicate things further than
separate tables. (views ARE good. updatable views + Rails ActiveRecord =
very uglily bad, at least in my experience)

As well there was talk about an AR extension to ignore certain
attributes -
> but I don't know what happened to it - I've looked around trac but couldn't
> find it.
>

Really? It does seem that nobody thought this is important. I think
using
the wrong database design pattern here... :-(

What is everybody using in this case? It seems most people just use
separate
tables (which is easiest, but cluttering the DB) and files (which is
much
more complex, and helped by existence of Mr. Kanthak's file_column
plugin).

I don't feel like writing a plugin just for this... especially not to
hack
ActiveRecord. :-( Maybe I'll have to surrender to separate tables...

--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Philip H. (Guest)
on 2007-02-21 05:50
(Received via mailing list)
>>
>>    has_one :image, :as => :blobable
> have any intrinsic meaning as a "record", or a blob record might have
> multiple blob columns... that might introduce yet another performance
> problem as all columns are loaded at once by ActiveRecord. Not to mention
> that the database become incohesive as now there isn't a 1-1 association
> between a model and a table (there are "dummy" blob tables). :-(

There's only *one* extra table... "blobs".  Everything else is typed to
that table.  And there's no load issue until you do "mything.image"...

And has_many works exactly the same as has_one.

We have 40,000 rows in our "blob" table, and while we don't store the
blogs in the database itself, we do use it to store all our various
types
of media... images, flash, wmv, rm, flv, icons, logos, etc... what's
nice
is that for icons/logos those inherit from images, but semantically we
can
treat them different.

-philip
Jodi S. (Guest)
on 2007-02-21 05:53
(Received via mailing list)
On 20-Feb-07, at 10:27 PM, Hendy I. wrote:

> basically the same as Philip's solution - but requires one table
>
> tables...
>

lol. the web is so small.

how about find_by_sql?

or check out this ticket : http://dev.rubyonrails.org/ticket/2546

  summary, using find(:select), then catch the MissingAttribute
exception to lazily load the attribute when object.blob is accessed.

I'd likely take that route.

Jodi
Hendy I. (Guest)
on 2007-02-21 06:19
(Received via mailing list)
On 2/21/07, Philip H. <removed_email_address@domain.invalid> wrote:
> is that for icons/logos those inherit from images, but semantically we can
> treat them different.
>
> -philip


Thanks Philip... By your experience in 40,000 rows, it seems that this
is
the way to go.

As you said, the typing will come in handy in some situations. There are
some gripes I feel about this concept, which is decohesiveness (the
actual
model and its blobs reside in separate tables) and coupling (the blob
table
contains lots of data that doesn't have any meaning, and is coupled with
all
other tables).

What I wonder is how I'd do this if I have several blob fields in a
model
having the same "type"? Do I really have to create several different
[dummy]
classes for this in the blob table?

For example, if Product has "blob fields" left_image, top_image,
front_image, that means I have to define LeftImage, TopImage, and
FrontImage
classes just for the purpose of identifying them in the blob table? (so
each
product will have 3 records in blob table)

Or I just use a single Image class then use has_many? (so each product
will
have arbitrary number of records in blob table)

...or both approaches? (each product will have arbitrary blobs each with
its
own types)

I'm curious about your experience.

--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Hendy I. (Guest)
on 2007-02-21 06:28
(Received via mailing list)
On 2/21/07, Jodi S. <removed_email_address@domain.invalid> wrote:
> I'd likely take that route.
> Jodi
>

Yes, Jodi... find :select is exactly what I want.... except that it
doesn't
come 'by default'. (I wanted to put this :select into the model itself,
so
everybody using find without specifying columns explicitly, will use the
default columns I set that omits the BLOBs)

With find :include, matters get worse... since :include doesn't allow
you to
specify anything on the "inner find".

I really would love to have this approach, but for now, Rails seems to
force
me into hacking it when I have to use this approach. (having to
find&replace
manually all .find using :select, then consciously avoiding :include
.....
dunno what else )

plugins, plugins, plugins.. Where are you when I need one? ;-)

--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Philip H. (Guest)
on 2007-02-21 06:40
(Received via mailing list)
>>
>> -philip
>
>
> Thanks Philip... By your experience in 40,000 rows, it seems that this is
> the way to go.

Keep in mind, we don't store the blob data itself in the database...
just
meta-data about it (file size, url, width, height, duration, etc.)

> As you said, the typing will come in handy in some situations. There are
> some gripes I feel about this concept, which is decohesiveness (the actual
> model and its blobs reside in separate tables) and coupling (the blob table
> contains lots of data that doesn't have any meaning, and is coupled with all
> other tables).
>
> What I wonder is how I'd do this if I have several blob fields in a model
> having the same "type"? Do I really have to create several different [dummy]
> classes for this in the blob table?

Depends on if you want to uniquely identify them.

> For example, if Product has "blob fields" left_image, top_image,
> front_image, that means I have to define LeftImage, TopImage, and FrontImage
> classes just for the purpose of identifying them in the blob table? (so each
> product will have 3 records in blob table)

I would create the 3 separate classes myself.  That makes it easy to
refer
to them in your product views since you're going to want one of them
specifically.

> Or I just use a single Image class then use has_many? (so each product will
> have arbitrary number of records in blob table)

Do this when your products have a series of screenshots (for example)
and
you don't care what order they come back in.

> I'm curious about your experience.

Do both, depending on the need.  I recently did a project that had a
bunch
of articles that belonged to a category.  The category had a Logo and an
Icon.  The article had a PreferredImage and had many Images.

All of those classes were inherited from the Image class so shared in
the
common methods we have to manipulate images (like thumbnail generation).

-philip
Hendy I. (Guest)
on 2007-02-21 07:43
(Received via mailing list)
On 2/21/07, Philip H. <removed_email_address@domain.invalid> wrote:
> >> of media... images, flash, wmv, rm, flv, icons, logos, etc... what's
> > the way to go.
> all
> > For example, if Product has "blob fields" left_image, top_image,
> > Or I just use a single Image class then use has_many? (so each product
> Icon.  The article had a PreferredImage and had many Images.
>
> All of those classes were inherited from the Image class so shared in the
> common methods we have to manipulate images (like thumbnail generation).
>
> -philip


Thank you Philip, your advices are so useful. I'm beginning to consider
the
merits of having separate tables.

Any plugins you'd like to suggest? (image- or blob- or polymorph-
related)

--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Scott (Guest)
on 2007-02-21 14:35
(Received via mailing list)
Is there any talk of adding lazy loaded attributes to model objects? I
think this is a fundamental piece of functionality missing from
ActiveRecord that ORM frameworks like Hibernate and EJB3 have.
Michael S. (Guest)
on 2007-02-21 15:17
(Received via mailing list)
On Wednesday 21 February 2007, Scott wrote:
> Is there any talk of adding lazy loaded attributes to model objects?
> I think this is a fundamental piece of functionality missing from
> ActiveRecord that ORM frameworks like Hibernate and EJB3 have.

If you think so, write a plugin that implements this. As a starting
point, I suggest you prototype this functionality by

- using #find(:all, :select => '...')
- explicitly implementing accessors for the lazy attributes which load
them when needed

Then, if you're satisfied it's actually worth it, look at
ActiveRecord::Base#construct_finder_sql. Making it work with included
associations will be a bit trickier, however. Nonetheless, I'm sure you
will find help in this group.

Michael

--
Michael S.
mailto:removed_email_address@domain.invalid
http://www.schuerig.de/michael/
Philip H. (Guest)
on 2007-02-21 20:42
(Received via mailing list)
> Any plugins you'd like to suggest? (image- or blob- or polymorph- related)

Nope.  We're not actually using any for this project.

In another project though I really like flex_image for it's ability to
resize/dropshadow/watermark and then cache to disk...

-philip
Hendy I. (Guest)
on 2007-02-22 04:30
(Received via mailing list)
On 2/22/07, Philip H. <removed_email_address@domain.invalid> wrote:
> -philip
Thank you Philip. I'll check that out... :-)


--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Hendy I. (Guest)
on 2007-02-22 04:34
(Received via mailing list)
On 2/22/07, Philip H. <removed_email_address@domain.invalid> wrote:
> -philip
>

Thank you Philip. I'll check that out. :-)

--
Hendy I.
Web: http://hendy.gauldong.net
Mobile: +62 856 24889899
Yahoo Messenger: ceefour666
LinkedIn: http://www.linkedin.com/in/ceefour
Mario J. (Guest)
on 2007-05-26 01:31
Hendy,

Regardless of the issue related to lazy loading of BLOBs have you hit or
seen this issue:

http://www.ruby-forum.com/topic/109387

I have not seen anyone with an answer to this. Thanks Mario
Hendy I. (Guest)
on 2007-06-25 11:13
(Received via mailing list)
Dear Mario,

On 5/26/07, Mario J. <removed_email_address@domain.invalid> wrote:
>
I'm less of a Rails 'hacker' lately.

Considering the rapid development of Rails and its plugins... I would
avoid
jumping the hoops just to force making the DBA happy.

100 MB BLOB column? Come on, just use the filesystem. Somebody has heard
of
S3.

It's not a "solution" to the real problem. But it's an alternative that
I
think should be considered with great care.

There are lots of ups about Rails' ActiveRecord model, and that's where
it
shines. But there are pitfalls that make us have to 'abandon' old
time-tested DB practices.

For example, rather than making a single tabel span tens of columns and
having multiple bloated ones.... I now prefer narrow tables. And related
models just by using has_one relationship. Relationships are easy in
Rails.
Sometimes much easier than limiting the returned columns using 'SELECT'!

With proper indexes and good DBMS I think this can result in more
maintainable app while not reducing performance.
This topic is locked and can not be replied to.