Database/program design question

kopf1988 · January 8, 2008, 7:52pm

Hi, I’m a bit new to the MVC design and all the fun stuff that goes
along with Ruby on Rails. I just got RoR working on my host (I’m stuck
on version 1.1.6, though), and I was looking for advice on how I
should design this in my mysql database.

I am working on a system for image galleries. Essentially, I have
several Users, and each user could potentially have several (has_many)
picture galleries. Sort of like facebook or myspace picture galleries.
My problem is this: each gallery is going to have to have it’s own
table (I think) because there could be a really large amount of
pictures, and I’ll need columns for comments, description, filename,
etc. But using something from RoR that I’ve learned so far, I can do
has_many :galleries, but then I only get 1 gallery model - linked to
just one table. I guess I could set the foreign key (user_id) to each
row in that table, and then store the name of the specific gallery
table (gallery_kopf1988_1)… but then how do I get that gallery
working in the same Model-View-Controller fashion?

I hope this makes some sense…

kopf1988 · January 8, 2008, 8:39pm

i don’t think this makes much sense
you won’t get any performance by having multiple tables and it will make
your code very difficult to handle and you’ll have to write things
multiple times or code outside the model

i would use a gallery table and pictures :belongs_to “gallery”
comments (and maybe descriptions) have their own tables (:belongs_to
“picture”)

or can a picture belong to several galeries, too? then it would be a
:has_and_belongs_to relation (since a gallery will have multiple
pictures) in that case you would need to have them in one table anyway

if the galleries are per user, things become even more complicated and
the approach of having your own table per gallery will fail. another
problem would be, that you can’t create new galleries if you need to
create a table for them

kopf1988 · January 8, 2008, 9:18pm

Okay, so I’ll have to just have every picture in the same table then,
belonging to a specific gallery. I was just thinking this might not be
scalable once I get to a whole lot of pictures.

So the gallery “kopf1988” will then “has_many :pictures”
And then the pictures will “has_many :comments”

Okay… should be simple enough. And I can even use the comments model
for the comments that I’ll have in other parts of the website, so that
could keep things DRYer.

Let’s hope I remember all this by the time I get home.

On Jan 8, 1:39 pm, Thorsten M. [email protected]

kopf1988 · January 8, 2008, 10:00pm

On Jan 8, 12:18 pm, kopf1988 [email protected] wrote:

Okay, so I’ll have to just have every picture in the same table then,
belonging to a specific gallery. I was just thinking this might not be
scalable once I get to a whole lot of pictures.

Don’t worry about scalability until you have to. Odds are, you won’t.
You should be so lucky!

///ark

kopf1988 · January 9, 2008, 3:22am

On Jan 8, 3:59 pm, Mark W. [email protected] wrote:

On Jan 8, 12:18 pm, kopf1988 [email protected] wrote:

Okay, so I’ll have to just have every picture in the same table then,
belonging to a specific gallery. I was just thinking this might not be
scalable once I get to a whole lot of pictures.

Don’t worry about scalability until you have to. Odds are, you won’t.
You should be so lucky!

This topic is kind of fortuitous for me- I’ve actually been looking
for an answer to something like this, through the archives of this
group and with Google. I understand what you’re saying about
optimizing prematurely, but I have been that lucky before, and it was,
contrary to the saw, not a nice problem to have, at least for those
who had to deal with it. I worked on a site where people hadn’t really
worried about scalability, and it bit us a bit when the site’s traffic
increased a thousandfold over a very short period of time. So I think
there’s a balance to be struck there. One thing I learned from that
experience is that some things are often pretty easy to deal with-
using a poor algorithm, if it can just be rewritten, or failing to
cache as much as you should, for instance. But once you have actual
data it can be hard to deal with problems with db architecture cleanly
and quickly.

I’m wondering about a situation in which you’re offering an
application to the public, letting anyone sign up to use it, with what
is conceptually their own instance of it. Take something like Blogger,
just as an example. It’s pretty easy to map out a simple data design
for an individual blog… if you need to host tens of thousands of
them it is trickier. Let’s say that blogs have posts, and posts have
comments. One way to handle this is to stick all of the comments into
one large table with a foreign key that references posts (and do the
same for posts, relative to blogs). If you tune the db well, and have
enough RAM in your machine(s) to hold the entire dataset, this is
likely to scale pretty well for quite a while, but I think at some
point you may hit a wall with this approach- after all, if the average
post has 100 comments, and the average blog has 1000 posts, and there
are 10,000 blogs you’re talking about 1,000,000,000 records (which, of
course, would be a nice problem to have ). Beyond that, it’s less
clear than the original “one blog” problem. Instead of just having
posts, all of which belong to the one blog in question, you now have
to worry about which blog a post belongs to, which is not as
parsimonious as one (or at least I) would ideally like.

If you’re writing this from scratch, without using a framework, there
are a lot of ways to deal with this (including the one big table
approach), and there are trade-offs in each case (at least in each
case worth considering). If you’re willing to give up portability
across databases you could, for instance, use a namespace mechanism,
like Postgres’s schemas, creating a new schema for each instance of a
blog. This has the virtue of letting you embed the instances in an
overarching schema that handles information that is common across all
blogs- for instance commenters might be registered with one name
across all of the blogs. You could create a new set of tables for each
registered blog, and duplicate tables by munging some sort of prefix
onto each table name- that would be really ugly though. You could go
really far and virtualise each instance of the app, though that would
create a headache dealing with data that crossed instances.

So I guess what I’m asking is whether or not Rails has any canonical
way of dealing with this beyond the one big table approach. Blogger is
just an example- I’m more interested in the general answer than in
specific details about how a blogging site should/would work in Rails,
as I have no intention of trying to write a new Blogger. I’m pretty
new to Rails, and I figured that if I kept digging I’d find the answer
myself, but since it came up on this list I figured I’d get the
question into this thread.

Thanks

kopf1988 · January 9, 2008, 2:34pm

On Jan 9, 4:24 am, Mike W. [email protected] wrote:

This sort of thing is not now, nor is likely to be in the future, a
problem, providing you’re using a reasonably capable relational
database management system.

This sort of thing is, presently, a huge problem if you have to deal
with real traffic. At my last job I was responsible for apps that
averaged more than 5 million visitors/day, with both daily and
seasonal peaks. “Indexes” were not a silver bullet in that case- we
had to do even simple things in complicated ways to make things scale.
Many webapps are actually pretty simple- making them scale to millions
of users/day is the hardest bit, in my experience. I appreciate that
you took the time to answer, but I’m not looking for advice about what
a db will or will not handle. I have some experience with that,
including the experience of having a db completely fall over because
my traffic went from 1,000 visitors/day to >5 million/day in less than
two hours.

I’m just looking for an answer to a simple question. Is there a
standard way, in Rails, to deal with many instances of one
application? Other than chucking everything into one big table? If no,
fine. If yes, a pointer to the right docs would be great. I know it’s
tempting to respond to this sort of request with an aphorism, but I’d
like to ask that you just assume that I know what I’m doing, db-wise,
and that I have a good reason for asking the question.

Thanks

kopf1988 · January 9, 2008, 10:25am

On Jan 9, 2:21 am, Campo [email protected] wrote:

likely to scale pretty well for quite a while, but I think at some
point you may hit a wall with this approach- after all, if the average
post has 100 comments, and the average blog has 1000 posts, and there
are 10,000 blogs you’re talking about 1,000,000,000 records (which, of
course, would be a nice problem to have ). Beyond that, it’s less
clear than the original “one blog” problem. Instead of just having
posts, all of which belong to the one blog in question, you now have
to worry about which blog a post belongs to, which is not as
parsimonious as one (or at least I) would ideally like.

This sort of thing is not now, nor is likely to be in the future, a
problem, providing you’re using a reasonably capable relational
database management system. Let’s say anything from MySQL up - I
haven’t tried using SQLite for anything substantial so I can’t vouch
for its capabilities in the medium-large DB arena (I’d forgive it if
it couldn’t cope, though!)

Why not a problem? Indexes. And beyond that, partitioning can also be
pretty useful, in DBMS’ that provide it. You don’t need to pin your
entire table in RAM, perish the thought. You would hope to be able to
store a fair chunk of the index, though, with enough left over for
data page caching, so that the areas being referenced frequently are
served up without recourse to disk. And you get most of that for
free. At least you do with Oracle, SQL Server and Sybase, to name the
three with which I have most familiarity.

The thing is, if you’re facing the problem of an imminent billion-row
table then you should be in the position of being able to hire in some
database consultancy, if you don’t already have a full-time DBA. Or
team of DBAs.

So consider this as another vote for the don’t-worry-be-happy party.

Regards,

Mike