Help with AR design on a large (~100'sM) partition table

robredford · March 24, 2009, 3:33pm

First, I’m a physicist, not a computer scientist. I need to store 65M
rows now, with a growth rate of ~21M/year whose entries arrive daily.

The purpose of this DB is to store metadata for environmental
observations for the ionosphere. Basically it’s a glorified file
catalogue. I need the usual: fast enough inserts, reasonable
recovery, REST like access, etc. I have access to Ruby AR/AS 2.1.1,
Ruby 1.8.5 and eventually MySQL 5.1.x.

I’m trying to trade some storage space and lack of normalization for
simplicity of design and ease of maintenance by me. Basically, I want
to partition one very large table. My PK is essentially across 3
columns: filename, md5 checksum, destination. I’d like to partition
by year which is discernible by the filename and by obs_time if I can
but obs_time is not currently part of the PK. If that happens, then I
would have up to 40M rows in a single table though most tables would
have a few million; perhaps that is too many; perhaps a table for each
half year would work.

Currently, I have no foreign keys but I may want them one day.

Any help and suggestions would be really appreciated!

My schema follows. I have created a simple model off this table and
I’m using these decorators: validation, and before_save and after_save
to build an inventory summary (in another table not shown). I’ve
tried this on a non-partitioned table so far awaiting the installation
of MySQL 5.1.x by our IT department.

How do I go about integrating AR with a partitioned table? Can you
advise a simple pattern I might follow for such a large but simple
relationship?

create table ionofiles (

– TODO optimize variable sizes
id int(15) NOT NULL auto_increment PRIMARY KEY,
file varchar(100) NOT NULL, – Any
valid filename without the “.~N~” backup extensions.
md5 varchar(100) NOT NULL, – MD5 Sum
destination varchar(20) NOT NULL, – E.g.
MIDS, ADIC, AFWA, SWPC, …

`version`        varchar(20)  default NULL,            -- Optional

version.

`ursi`           char(5)      default NULL,       -- E.g. BC840,

WI937. default NULL because we allow any filename.

`time_scope`     varchar(20)  default NULL,     -- E.g.

(individual, daily, monthly, yearly, etc). default NULL

– because we allow non time-scoped files.

`content_type`   varchar(20)  default NULL,            -- E.g.

(root, scaled, image, ionogram)

`obs_time1`      datetime     default NULL,            -- Start of

measurement. Default NULL because we allow

– non data files.
obs_time2 datetime default NULL, – End of
measurement, in the case file represents

– multiple measurements. Default NULL because we allow

– non data files.

`extension`      varchar(5)   default NULL,            -- file

extension (any case). E.g. SAO, MMM, 16C, TXT, png.

`size`           int(11)      NOT NULL,                -- file

size in bytes

`backup`         int(2)       default NULL,            -- N of the

“~N~” tilda backup file.

`is_ready`       boolean       NOT NULL default 0,      -- Is this

file’s links ready? 0 = No, 1 = Yes.
is_public boolean NOT NULL default 1, – Is this
file publicly available? 0 = Public, 1 = Private.
is_removed boolean NOT NULL default 0, – Has
this file been marked deleted?,
– s.t. we
don’t ever want to see it again?
– This is
useful in the event that we want to reject future submissions of file.

`file_on_disk`   varchar(100) NOT NULL,   -- Actual file name on

disk: E.g. BC840_2009001000000.SAO.~10~
path varchar(200) NOT NULL, – A directory or a
full path to a ZIP or TAR or other aggregate file.
– E.g.
data/BC840/individual/2009/001/scaled/
– E.g.
data/BC840/individual/2009/BC840_2009001_2009-01-01.tar.gz

`last_verified`  datetime default NULL,   -- last time we checked

that this file actually exists at stated destination

– TODO: ActiveRecord fields. What should be the data type?
Timestamp or Datetime?
– The auto set and update features are currently handled by
AR.
– Should we put some ON UPDATE, etc actions here for non AR
interaction?
created_on datetime NOT NULL, –
ActiveRecord auto set.
updated_on datetime NOT NULL, –
ActiveRecord auto updated.

`note`           TEXT         default NULL,            -- User

note

UNIQUE KEY `key_file_md5_destination_public` ( `file`, `md5`,

destination, is_public ),

KEY `key_file`          ( `file` ),
KEY `key_md5`           ( `md5` ),
KEY `key_destination`   ( `destination` ),
KEY `key_version`       ( `version` ),
KEY `key_ursi`          ( `ursi` ),
KEY `key_time_scope`    ( `time_scope` ),
KEY `key_content_type`  ( `content_type` ),
KEY `key_obs_time1`     ( `obs_time1` ),
KEY `key_obs_time2`     ( `obs_time2` ),
KEY `key_extension`     ( `extension` ),
KEY `key_size`          ( `size` ),
KEY `key_backup`        ( `backup` ),
KEY `key_is_ready`      ( `is_ready` ),
KEY `key_is_public`     ( `is_public` ),
KEY `key_is_removed`    ( `is_removed` ),
KEY `key_file_on_disk`  ( `file_on_disk` ),
KEY `key_last_verified` ( `last_verified` ),
KEY `key_created_on`    ( `created_on` ),
KEY `key_updated_on`    ( `updated_on` )

) ENGINE=InnoDB;

R

robredford · March 24, 2009, 5:29pm

Interesting issue: your record count is several orders of magnitude
larger than I have to deal with.

A quick google brought up this page:

http://blog.new-bamboo.co.uk/2007/11/20/database-optimization-for-rails-apps

which might have some points for you to consider.

robredford · March 24, 2009, 8:52pm

Thank you. That’s a handy suggestion.

On Mar 24, 10:29 am, Ar Chron [email protected]

robredford · March 24, 2009, 10:44pm

On Tue, Mar 24, 2009 at 7:32 AM, RobR [email protected] wrote:

First, I’m a physicist, not a computer scientist. I need to store 65M
rows now, with a growth rate of ~21M/year whose entries arrive daily.

MySQL can deal with this out-of-the-box without breaking a sweat. I
wouldn’t worry about partitioning until you actually hit a wall. I
think Moore’s law is on your team for this project.

(Note: you probably don’t want indexes on every single column
individually. Add compound indexes tuned to your actual queries.)

Best,
jeremy

robredford · March 27, 2009, 5:58pm

Jeremy,

I did remove many indices as they aren’t needed for the most likely
use cases. Thank you for the sage suggestion. I saw an incredible
improvement in insert speed (1.5x) and reduction in disk space usage
(0.6x).

I have to say I think I’m already hitting the wall with this
simplistic approach for these reasons:

I see performance degradation on indexed searches at 90 million
records in a much more slimmed down version of the table, i.e. only 3
columns, with 1 PK and 3 individual indexes.
It takes ~24 hours to rebuild a MyIsam version of the 90 million
record table + indices via a mysqldump load; especially it’s PK. The
entire table is inaccessible during this PK rebuild since it ensures
consistency. This happens ~1/year.
With a 5 year forecast in mind, I expect > 200million relationships
or rows needed and hence a significant design goal.
With the originally posted schema and most indices removed, I’m still
looking at ~ 250 GB of physical disk space needed. Our DBs run using
an attached storage environment with many small heavily redundant
disks. I don’t know much more about that except that 70 GB is the
elemental partition size for DB’s. So my design will need to be
partition flexible in the near term and hence one reason why I looked
toward Partition Tables.

Onward, I did realize finally that 95% of my queries inherently know a
pretty good key a priori, i.e. the timestamp of the measurement is
embedded in the file name. Most users will be able to specify the
time range of interest, etc.

So, I’ve taken a stab at partitioning on month. MySQL has a
limitation of 1024 partitions per table and I needed to represent data
from 1940. I’ve designed this set of partitions on month boundaries
from 1940 to 2020. I assume adding more partitions later is easy but
since I didn’t know how I went ahead and partitioned up to the year
2020. A better solution would probably use a histogram of my data and
add more partitions during time periods of higher data density. The
problem is that I do not yet know this histogram… though it would
be easy to ascertain.

My present schema follows. Any comments are highly welcome! I’m
currently bulk testing and see these numbers:

~120 inserts + 120 selects per second which is pretty good.
I have 2 primary problems to resolve.
MySQL doesn’t have “global” indices so queries on Key that is not a
PK, require a scan across the indices of all ~800 tables which takes
~1 second.
I need to wittle the forecasted ~280 GB down to ~200 GB.

create table ionofiles (

– TODO optimize variable sizes
id int(15) NOT NULL auto_increment,
– Any valid filename without the “.~N~” backup extensions.
file varchar(100) NOT NULL,
– MD5 Sum
md5 varchar(100) NOT NULL,
– E.g. MIDS, ADIC, AFWA, SWPC, …
destination varchar(20) NOT NULL,

– Optional version.
version varchar(20) default NULL,

– E.g. BC840, WI937. default NULL because we allow any filename.
ursi char(5) default NULL,

– E.g. (individual, daily, monthly, yearly, etc). default NULL
because we allow non time-scoped files.
time_scope varchar(20) default NULL,

– E.g. (root, scaled, image, ionogram)
content_type varchar(20) default NULL,

– Start of measurement. Not NULL because we partition on this.
obs_time datetime NOT NULL,

– file extension (any case). E.g. SAO, MMM, 16C, TXT, png.
extension varchar(5) default NULL,

– file size in bytes
size int(11) NOT NULL,

– N of the “~N~” tilda backup file.
backup int(2) default 0,

– Is this file’s links ready? 0 = No, 1 = Yes.
is_ready boolean NOT NULL default 0,

– Is this file publicly available? 0 = Public, 1 = Private.
is_public boolean NOT NULL default 1,

– Has this file been marked deleted?,
is_removed boolean NOT NULL default 0,

– Actual file name on disk: E.g. BC840_2009001000000.SAO.~10~
file_on_disk varchar(100) NOT NULL,
– A directory or a full path to a ZIP or TAR or other aggregate file.
path varchar(200) default NULL,
– E.g.
data/BC840/individual/2009/001/scaled/
– E.g.
data/BC840/individual/2009/BC840_2009001_2009-01-01.tar.gz

– last time we checked that this file actually exists at stated
destination
last_verified datetime default NULL,

– TODO: ActiveRecord fields. What should be the data type?
Timestamp or Datetime?
– The auto set and update features are currently handled by
AR. Should we put some ON UPDATE, etc actions here for non AR
interaction?
created_on datetime NOT NULL, –
ActiveRecord auto set.
updated_on datetime NOT NULL, –
ActiveRecord auto updated.

`note`           TEXT         default NULL,            -- User

note

– TODO: reorder the key constituents to reflect best sub selection
capability
– Rows are unique by this
PRIMARY KEY key_destination_is_public_obs_time_file_md5
( destination, is_public, obs_time, file, md5 ),

– TODO: Is there any value in reordering these keys? Does order
matter?
KEY key_obs_time ( obs_time ),
KEY key_id ( id ),
KEY key_ursi ( ursi ),
KEY key_extension ( extension ),
KEY key_size ( size ),
KEY key_updated_on ( updated_on )

) ENGINE=InnoDB
PARTITION BY RANGE ( TO_DAYS( obs_time ) ) (
PARTITION p195001 VALUES LESS THAN (TO_DAYS
( “1950-01-01” )),PARTITION p195002 VALUES LESS THAN (TO_DAYS
( “1950-02-01” )),PARTITION p195003 VALUES LESS THAN (TO_DAYS
( “1950-03-01” )),PARTITION p195004 VALUES LESS THAN (TO_DAYS
( “1950-04-01” )),PARTITION p195005 VALUES LESS THAN (TO_DAYS
( “1950-05-01” )),PARTITION p195006 VALUES LESS THAN (TO_DAYS
( “1950-06-01” )),PARTITION p195007 VALUES LESS THAN (TO_DAYS
( “1950-07-01” )),PARTITION p195008 VALUES LESS THAN (TO_DAYS
( “1950-08-01” )),PARTITION p195009 VALUES LESS THAN (TO_DAYS
( “1950-09-01” )),PARTITION p195010 VALUES LESS THAN (TO_DAYS
( “1950-10-01” )),PARTITION p195011 VALUES LESS THAN (TO_DAYS
( “1950-11-01” )),PARTITION p195012 VALUES LESS THAN (TO_DAYS
( “1950-12-01” )),
…
– up to year 2020
PARTITION p999999 VALUES LESS THAN MAXVALUE
);