Rails without a primary key

Worky_Workerson · July 26, 2006, 9:11pm

I’ve been trying to use Rails on a database where the biggest
datawarehouse-style table doesn’t have a primary key. It really
doesn’t make sense and just adds extra unnecessary space. I do have,
however, a column that I normally use to search on (IP) that returns
many rows. The table also references several other tables through
rails-friendly foreign keys.

Simple AR works just fine but I’ve been running into trouble with more
advanced stuff and I’m not sure how to get around it. My biggest
problem is that the :include directive to find (to enable eager
associations) fails because it assumes that there is an ID field
present and tries to select it in the join. Similarly, the paginate
helper attempts to SELECT COUNT(DISTINCT mytable.id) when it is trying
to get the number of results of the query and bombs.

I have read all the legacy docs that I can find, but none seem to
address the issue of a non-existant primary key. Is there anything
that I missed?

Is there any way to fix the above problems?

thanks
-Worky

Worky_Workerson · July 26, 2006, 11:40pm

Ask if you can add a primary key to the table. You’ll feel better when
your table has one. Go on.

Worky_Workerson · July 27, 2006, 2:27am

Dr Nic wrote:

Ask if you can add a primary key to the table. You’ll feel better when
your table has one. Go on.

I’m assuming that you are joking …

I’ve had a similar problem with trying to put a rails frontend onto a
very large table. Perhaps we are storing the same data … flow
records? For large tables like these (mine is 200GB) its not logical
nor practical to store a primary key, especially not a meaningless
integer one. Accesses never return a single row and aggregates are
often used. Which is sometimes frustrating with Rails since that’s not
quite the environment it was designed for.

I’ve hacked around the problem by using find_by_sql a lot more than I
would like (i.e. almost everywhere), but that is not really the rails
way. If someone has better solutions I’d love to hear them as well.

Worky_Workerson · July 27, 2006, 5:58am

Mike wrote:

Dr Nic wrote:

Ask if you can add a primary key to the table. You’ll feel better when
your table has one. Go on.

I’m assuming that you are joking …

I doubt he is.

I’ve hacked around the problem by using find_by_sql a lot more than I
would like (i.e. almost everywhere), but that is not really the rails
way. If someone has better solutions I’d love to hear them as well.

The Rails way is to have some way to uniquely identify each row in the
database. This is so that each object (or instance, actually) can be
mapped to a unique row. Without any unique-ness, you’re not using
ActiveRecord as the Object-Relational Mapper that it is. You’ll likely
find this similar situation no matter what ORM tool you use. Mapping
objects to rows in a database works best by far when you can guarantee
uniqueness of each row.

Given your copious use of find_by_sql, you may be better off just using
the database connector for your database platform. I’m not saying what
you need can’t be done in AR, just that you may save yourself some
serious grief if you skip AR.

Worky_Workerson · July 27, 2006, 9:32am

On Jul 27, 2006, at 12:58 , Kevin W. wrote:

The Rails way is to have some way to uniquely identify each row in the
database. This is so that each object (or instance, actually) can be
mapped to a unique row.

Just to be clear, having a unique integer column (such as that
provided by MySQL’s AUTOINCREMENT or Postgres’ SERIAL columns) is
only one way to guarantee row uniqueness. The caveat is that this is
the way required by stock Rails to uniquely identify rows. Another
way is to enforce uniqueness on a single column of some other type,
or use a composite key to guarantee the uniqueness of some
combination of columns.

Michael G.
grzm seespotcode net

Worky_Workerson · July 27, 2006, 12:58pm

On Jul 27, 2006, at 18:15 , Michael A. wrote:

Yeah, I guess I should have been a bit clearer. I don’t have an
integer, single-column primary key, however the rows are unique over
multiple columns. Know of any way to “solve” this without a DB view?

I’ve heard someone has recently developed a composite key plugin or
patch. I don’t think it’s in Rails edge or even in the patch queue,
but some googling should help you find it. Haven’t used it myself.
Good luck!

Michael G.
grzm seespotcode net

Worky_Workerson · July 27, 2006, 12:36pm

Just to be clear, having a unique integer column (such as that
provided by MySQL’s AUTOINCREMENT or Postgres’ SERIAL columns) is
only one way to guarantee row uniqueness. The caveat is that this is
the way required by stock Rails to uniquely identify rows. Another
way is to enforce uniqueness on a single column of some other type,
or use a composite key to guarantee the uniqueness of some
combination of columns.

Yeah, I guess I should have been a bit clearer. I don’t have an
integer, single-column primary key, however the rows are unique over
multiple columns. Know of any way to “solve” this without a DB view?

Worky_Workerson · July 27, 2006, 1:22pm

On Jul 27, 2006, at 20:03 , Pratik wrote:

http://wiki.rubyonrails.org/rails/pages/Uses+Guid+Plugin

While the above looks interesting, it isn’t support for composite
keys. A quick google turned up:

http://compositekeys.rubyforge.org/

Again, I haven’t used it, but you may find it helpful.

Michael G.
grzm seespotcode net

Worky_Workerson · July 27, 2006, 2:36pm

http://compositekeys.rubyforge.org/

Yep, and I’m adding composite foreign key support soon too.

Then you’ll be able to do fun stuff like:

class Foo
set_primary_key :id # just for completeness
end

class Bar
set_primary_keys :foo_id, :seq
has_many :bars, :foreign_key => [:foo_id, :seq]
end

class Tar
belongs_to :bar, :foreign_key => [:foo_id, :seq]
end

But I’ll try to make it unnessary to explicitly specify :foreign_key if
the foreign key column names match the primary key column names.

Cheers
Nic

Worky_Workerson · July 27, 2006, 1:05pm

http://wiki.rubyonrails.org/rails/pages/Uses+Guid+Plugin

On 7/27/06, Michael G. [email protected] wrote:

Good luck!

–
rm -rf / 2>/dev/null - http://null.in

“Things do not happen. Things are made to happen.” - JFK

Worky_Workerson · July 27, 2006, 2:38pm

class Bar
set_primary_keys :foo_id, :seq
has_many :bars, :foreign_key => [:foo_id, :seq]
end

That should be:
has_many :tars, :foreign_key => [:foo_id, :seq]

Worky_Workerson · July 27, 2006, 10:59pm

Worky wrote:

Yes, I am storing network flow data. This means that I have a large
table that looks like:

src_ip IP
src_port INT
dest_ip IP
dest_port INT
start TIMESTAMP
end TIMESTAMP
bytes INT

So there isn’t a defined uniqeness constraint, however there is an
implicit one across all the columns (i.e. it would be very unlikely to
have a duplicate row).

I also have a similar problem with network event data, i.e. like IDS
events or p0f data. Basically, I have a table that looks like:

src_ip IP
event TEXT
…other columns…
first_seen TIMESTAMP
last_seen TIMESTAMP
number_seen INT

so a composite primary key would span all the columns except for
{first,last,number}_seen. I have not defined one, however, as it
doesn’t really fit with the access pattern and I use PostgreSQL triggers
to merge new data in to “ensure” uniqeness.

http://compositekeys.rubyforge.org/

Looks very interesting … I’ll see if it fixes the “include” problems
that I mentioned initially. Like Mike mentioned previously, I never use
the find(pkey) method, as I’m always interested in many rows
(find_by_src_ip).

Although the size of the table presents an interesting challenge, this
is exactly the case for an otherwise meaningless surrogate key. Indexing
on, and selecting by, that one :id column is much more performant than a
6-10 column natural key.

Assuming you still don’t want to do surrogate keys, how about the table
partitioning feature of MySQL 5.1? Taking the table size out of the
equation is the only thing coming to mind.

Worky_Workerson · July 28, 2006, 12:22am

Although the size of the table presents an interesting challenge, this
is exactly the case for an otherwise meaningless surrogate key. Indexing
on, and selecting by, that one :id column is much more performant than a
6-10 column natural key.

I think the idea is that no access to the table will result in a
single row being returned so there is no reason, practical or logical,
to have a key on this table. In my case, most queries use some form
of IP and port and generate a series of records.

And (in my case, not sure about OP) we use Postgres’ table
partitioning, which works great. The issue isn’t necessarily table
size for query performance, but just table size in general. I have
similar (legacy) projects generating 200-300GB per day of traffic, so
every bit that we can save per record results in a fairly large
savings overall.

Rails has been a quick way to put a face on those databases and it
works great for simple models or those that conform to Rails
conventions. That composite key plugin looks mighty useful, however.

Worky_Workerson · July 27, 2006, 3:01pm

Yes, I am storing network flow data. This means that I have a large
table that looks like:

src_ip IP
src_port INT
dest_ip IP
dest_port INT
start TIMESTAMP
end TIMESTAMP
bytes INT

So there isn’t a defined uniqeness constraint, however there is an
implicit one across all the columns (i.e. it would be very unlikely to
have a duplicate row).

I also have a similar problem with network event data, i.e. like IDS
events or p0f data. Basically, I have a table that looks like:

src_ip IP
event TEXT
…other columns…
first_seen TIMESTAMP
last_seen TIMESTAMP
number_seen INT

so a composite primary key would span all the columns except for
{first,last,number}_seen. I have not defined one, however, as it
doesn’t really fit with the access pattern and I use PostgreSQL triggers
to merge new data in to “ensure” uniqeness.

http://compositekeys.rubyforge.org/

Looks very interesting … I’ll see if it fixes the “include” problems
that I mentioned initially. Like Mike mentioned previously, I never use
the find(pkey) method, as I’m always interested in many rows
(find_by_src_ip).

Worky_Workerson · July 28, 2006, 6:30pm

Michael A. wrote:

Although the size of the table presents an interesting challenge, this
is exactly the case for an otherwise meaningless surrogate key. Indexing
on, and selecting by, that one :id column is much more performant than a
6-10 column natural key.

I think the idea is that no access to the table will result in a
single row being returned so there is no reason, practical or logical,
to have a key on this table.

I guess it depends on your usage of this data. If it’s reporting, fine.
If you have to update row #1907135067, having a unique ID is what works
best with an ORM solution.

… I have
similar (legacy) projects generating 200-300GB per day of traffic, so
every bit that we can save per record results in a fairly large
savings overall.

What is your memory footprint when loading all these records? In all the
ORM projects I’ve been on, memory usage became an issue when the tables
were large. This is because a unique object is created for each row, and
if you load 2 million rows you have 2 million objects in memory. This is
another case where ORM is not the best way to go.

Rails has been a quick way to put a face on those databases and it
works great for simple models or those that conform to Rails
conventions. That composite key plugin looks mighty useful, however.

I’d love to find out your experience with that plugin, as
legacy/enterprise integration is key to being able to use Rails at work
for many Java/DotNet developers.

Worky_Workerson · July 30, 2006, 4:20am

I think the idea is that no access to the table will result in a
single row being returned so there is no reason, practical or logical,
to have a key on this table.

I guess it depends on your usage of this data. If it’s reporting, fine.
If you have to update row #1907135067, having a unique ID is what works
best with an ORM solution.

Yep, reporting. My DB is a read-only data warehouse, which I guess
would be considered an “integration” database from Rails’ perspective.

… I have
similar (legacy) projects generating 200-300GB per day of traffic, so
every bit that we can save per record results in a fairly large
savings overall.

What is your memory footprint when loading all these records? In all the
ORM projects I’ve been on, memory usage became an issue when the tables
were large. This is because a unique object is created for each row, and
if you load 2 million rows you have 2 million objects in memory. This is
another case where ORM is not the best way to go.

I don’t load all the records into the interface … that would be
crazy I just need to be able to put a quick query interface onto
the database so that an analyst can issue queries and anything that
would be too unwieldy for AR is probably too unwieldy for a normal
analyst

Rails has been a quick way to put a face on those databases and it
works great for simple models or those that conform to Rails
conventions. That composite key plugin looks mighty useful, however.

I’d love to find out your experience with that plugin, as
legacy/enterprise integration is key to being able to use Rails at work
for many Java/DotNet developers.

I’ll keep you posted, once I have a chance to get to it.