How to store order history / invoices

benjohnson · October 15, 2006, 3:21am

I’ve always wondered how everyone else does this, so I figured I would
ask. From my understanding an order history database table really
shouldn’t have any foreign keys, it should just be a carbon copy of
everything. That way if any of the relationships get deleted you still
have all of the information.

The weird thing is, every tutorial I read uses foreign keys to store
their order history.

Let’s say an order has a user account relationship and a product
relationship. What if the user edits his information or the product
information changes? Wouldn’t the order history then be flawed?

How do you handle this?

Thanks for your help.

benjohnson · October 15, 2006, 7:11am

Audit trails require a change of mindset. There are two approaches
that are commonly used, and both are successful within the bounds of
the issues that they are intended to deal with, in reasonably large
(terabyte plus) systems.

The approach that I tend to prefer is the polyphasic timeline or
“insert-only” approach, in which rows in the tables are only ever
inserted, never deleted or updated (except for the inactivated_at
column). This is accomplished by keeping “created_at” and
“inactivated_at” columns on every table. The appearance and business
effect of an update is accomplished by inactivating the existing row
and creating a new row. The appearance and business effect of deleting
a row is accomplished by inactivating the row. Rows that are currently
active are readily identified by “inactivated_at is null” in the sql.

create table Orders
(
id integer not null,
created_at timestamp default curent_timestamp not null,
inactivated_at timestamp,
other columns …
)

create table OrderLineItems
(
id integer not null,
created_at timestamp default curent_timestamp not null,
inactivated_at timestamp,
parent_id not null references Orders(id),
sequence integer not null,
other columns …
)

This mechanism is always implemented in the application code, and it
requires a bit of a twist to the mindset during design and coding.
Personally, I find that it simplifies matters but not all programmers
agree with me on that. The only enforcement possible in the RDBMS is
that no permissions are granted for delete on any table.

The other approach is to keep parallel transaction tables that snapshot
the information being audited, plus the transaction that was being
performed. So an Order table would have a parallel OrderTransaction
table that contained all of the columns of the Order table that were
important to audit, plus columns to say who, when, and what transaction
was performed.

Note that the foreign key relationship between transaction and baseline
tables are conceptual, but never made concrete in the database schema.
This means that the base business tables may be acted upon in any way
without interference by the RDBMS’ referential integrity. You will
probably choose to create indexes

create table Orders
(
id integer not null,
other columns …
)

create table OrderLineItems
(
id integer not null,
parent_id not null references Orders(id),
sequence integer not null,
other columns …
)

create table OrderTransactions
(
id integer not null, – transaction id
created_at timestamp default curent_timestamp not null,
trans_type varchar(20) not null constraint
(‘INSERT’,‘UPDATE’,‘DELETE’),
trans_by_user VARCHAR(20) not null,
order_id not null,
other columns cloned from Order …
)

create table OrderLineItemTransactions
(
id integer not null, – transaction id
created_at timestamp default curent_timestamp not null,
trans_type varchar(20) not null constraint
(‘INSERT’,‘UPDATE’,‘DELETE’),
trans_by_user VARCHAR(20) not null,
orderLineItem_id not null,
other columns cloned from OrderLineItem …
)

The parallel transaction table schema may be easily implemented in the
trigger/stored procedure language of your RDBMS, and become transparent
to your application code.

Does this help?

benjohnson · October 15, 2006, 7:15am

I forgot to define the primary keys …

create table Orders
(
id integer not null,
created_at timestamp default curent_timestamp not null,
inactivated_at timestamp,
other columns …,
primary key (id, created_at)
);
create index order_inactive_at on Orders (inactivated_at);

create table OrderLineItems
(
id integer not null,
created_at timestamp default curent_timestamp not null,
inactivated_at timestamp,
parent_id not null references Orders(id),
sequence integer not null,
other columns …
primary key (id, created_at)
);
create index order_inactive_at on Orders (inactivated_at);

benjohnson · October 15, 2006, 7:30am

unknown wrote:

I forgot to define the primary keys …

create table Orders
(
id integer not null,
created_at timestamp default curent_timestamp not null,
inactivated_at timestamp,
other columns …,
primary key (id, created_at)
);
create index order_inactive_at on Orders (inactivated_at);

create table OrderLineItems
(
id integer not null,
created_at timestamp default curent_timestamp not null,
inactivated_at timestamp,
parent_id not null references Orders(id),
sequence integer not null,
other columns …
primary key (id, created_at)
);
create index order_inactive_at on Orders (inactivated_at);

Wow, thanks for the detailed response. I like your approach so I kind of
used them both. Rails comes with a sweeper class that is perfect for
auditing. I can basically log any change to any record in the database,
which user did it, when they did it, and what attributes the model had,
etc. It’s a recipe right out of the rails recipes book. The only problem
with this is that its a general auditing table and not something I’d
like to use in the actual application. I say this because this table
audits basically every model in my application. I also have a big text
field called “attributes” that contains a serialization of the model’s
attributes. So this really isn’t suitable for using in the actual
application.

So what I did was make an orders table that has all of the foreign keys
present. They are only there for running reports, seeing which orders a
user has placed, etc. Then I also have fields in there that carbon copy
all of the present information that is important. But like I said, for
the sake of reporting I need those foriegn keys and I need all of the
relationships to stay intact. To do this, I did something very similar
to what you outlined. I won’t let those relationships get deleted if
they have orders. I just over rode the destroy method for those models.

Does my approach sound solid to you?

Thanks again for your help.

benjohnson · October 15, 2006, 3:58pm

If you can reproduce the state of the order at any point in its history,
no
matter how many or what changes the user made, the model is sound.

In the exampl DDL, I have made foreign keys explicit in the business
tables. “column type references table(col)” in the create table DDL is
a
shortcut for “create foreign key on table1(col) references table2(col)”
in a
separate DDL command.

However, in transaction tables you need to record events that would
cause some
RI mechanisms to delete the transaction rows if explicit foreign keys
were
used. The transaction tables should be indexed, but probably not
explicitly
tied to the foreign key. For reporting purposes, there is no difference
between well-defined indexes and explicitly defined foreign key
relationships.

I shy away from explicit foreign keys for transaction tables because,
with
explicit foreign keys, you have referential integrity and its side
effects,
which vary between RDBMS’. Creating the necessary indexes for efficient
reporting is good and predictable, and in many cases more efficient than
foreign keys.

I hope this helps.
David J.