Dealing with an EAV database

I’ve inherited an EAV database and there’s really no option to remodel
it. Data is stored as key_name, key_value pairs
Trying to return a meaningful, unified recordset is far too complex to
be efficient.
I’ve decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
functions to return a fast query result, with the consolidated fields
as one string.
The final data is to be displayed and downloaded (CSV), but not all
fields will be available for each “row”, which makes serialization
awkward. I can’t decide where the “heavy lifting” should go to parse
and process this data for screen and download.

Does anyone have a good strategy for this?

jrq wrote:

I’ve inherited an EAV database and there’s really no option to remodel
it.

Why not? Are others depending on the existing schema?

Data is stored as key_name, key_value pairs
Trying to return a meaningful, unified recordset is far too complex to
be efficient.
I’ve decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
functions to return a fast query result, with the consolidated fields
as one string.
The final data is to be displayed and downloaded (CSV), but not all
fields will be available for each “row”, which makes serialization
awkward. I can’t decide where the “heavy lifting” should go to parse
and process this data for screen and download.

Does anyone have a good strategy for this?

If you can’t remodel the DB, decline the project. :slight_smile:

If you can’t decline the project, export the data into a reasonable
schema (a non-SQL database like MongoDB may be a good option here). A
Google search also turned up the acts_as_eav_model plugin, but I know
nothing about it.

Another alternative: build a RESTful service (perhaps without Rails) on
top of the EAV database, then connect Rails to it through
ActiveResource.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Marnen Laibow-Koser wrote:

jrq wrote:

I’ve inherited an EAV database and there’s really no option to remodel
it.
[…]
If you can’t remodel the DB, decline the project. :slight_smile:

Hmmm. This question and others like
Is it possible to query on virtual attributes? - Rails - Ruby-Forum are making me start to think that
I should put the effort into that ORM library I’ve been meaning to try
to write…

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Please quote when replying; otherwise, the discussion becomes difficult
to follow.

jrq wrote:

Good and reasonable suggestions. I’d rather avoid supplementing the
database end of things.

I’m not sure what you mean by this.

Basically, if you’re going to use Rails, you have to make sure that it
and the DB understand each other.

The complete data set is fairly small <20,000
rows and queries are likely to only return 1,000 rows at a time. I
don’t think the string parsing is that complex, a couple of
seconds would be OK, 10 seconds, not so much. 3

But you’d be munging all the fields together in the query, then
separating them in the app. That’s a lot of unnecessary string
processing. It would be better to avoid ti.

It’s for read-only
reports, so I don’t really care about preserving the record integrity.

Huh?

I’m wondering whether it makes sense to do half the processing in the
controller/model,

Controllers shouldn’t do processing.

and half in some helper (for the screen aspect).

Unlikely. Dealing with the DB is really the job of the model. Helpers
are only for presentation formatting.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

By “avoid supplementing the database end of things”, I mean I want to
avoid adding more tables or databases to the database engine,
inserting an additional database layer.

Yes, I’m currently intending to “munge” the unorganized fields into a
string, and deal with the parsing/processing at another stage. yes,
it’s inefficient, but it’s a relatively small amount of processing,
and hopefully it will get me to where I need to be within a short
amount of time, which is a factor in this situation.

By “not preserving record integrity”, I mean that I am not updating
any of these records, it’s read-only. I want to pull the data and
generate reports, both for screen and CSV stream.

Yes, most of the processing would be in the Model, not the Controller,
but since the model is not a good representation of the existing
database schema, there’s going to be some fudging going on. The
question is whether it’s worth bothering to model the final data
output at all.

I know that helpers are for presentation formatting, which is why I
specified that the helper process would be for the “screen”.

Thanks.

Good and reasonable suggestions. I’d rather avoid supplementing the
database end of things. The complete data set is fairly small <20,000
rows and queries are likely to only return 1,000 rows at a time. I
don’t think the string parsing is that complex, a couple of
seconds would be OK, 10 seconds, not so much. It’s for read-only
reports, so I don’t really care about preserving the record integrity.
I’m wondering whether it makes sense to do half the processing in the
controller/model, and half in some helper (for the screen aspect).

Again: please quote when replying!

jrq wrote:

By “avoid supplementing the database end of things”, I mean I want to
avoid adding more tables or databases to the database engine,
inserting an additional database layer.

But that’s the right way to do it if you can’t just get rid of the EAV
schema. There is no benefit to avoiding DB objects just for the sake of
avoiding DB objects.

Yes, I’m currently intending to “munge” the unorganized fields into a
string, and deal with the parsing/processing at another stage. yes,
it’s inefficient, but it’s a relatively small amount of processing,

It’s not just inefficient; it will be inconvenient. Don’t do it.

and hopefully it will get me to where I need to be within a short
amount of time, which is a factor in this situation.

No. It will just complicate your life immensely. It may look simpler
now, but once you try to do anything non-trivial, I think you’ll find
that it’s a lot harder.

By “not preserving record integrity”, I mean that I am not updating
any of these records, it’s read-only. I want to pull the data and
generate reports, both for screen and CSV stream.

Then do it right and transform the data into a non-EAV representation
once you pull it.

Yes, most of the processing would be in the Model, not the Controller,
but since the model is not a good representation of the existing
database schema, there’s going to be some fudging going on.

Then write a model that is a good representation of it (perhaps
supplementing or completely ditching ActiveRecord).

The
question is whether it’s worth bothering to model the final data
output at all.

Of course it is. That way you can manipulate it more easily than with
string parsing.

I know that helpers are for presentation formatting, which is why I
specified that the helper process would be for the “screen”.

Thanks.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Wednesday 06 October 2010, jrq wrote:

Yes, I’m currently intending to “munge” the unorganized fields into a
string, and deal with the parsing/processing at another stage. yes,
it’s inefficient, but it’s a relatively small amount of processing,
and hopefully it will get me to where I need to be within a short
amount of time, which is a factor in this situation.

Have you considered using database views? I have never been in this
situation, so take my suggestions with a grain of salt. Here’s what I’d
try:

Use database views to map the EAV (non-)schema to sane relational
schema.

You didn’t say if you need to insert into that database. If you do, see
if MySQL supports updatable views. If it does, give them a try,
otherwise handle inserts yourself.

If none of this works, consider not using ActiveRecord at all, but
rather something like sequel is a better fit
http://sequel.rubyforge.org/

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

On Oct 6, 2:29 pm, Michael S. [email protected] wrote:

try:
I’ve been looking at that. I’m not sure how that gets around the
complex queries, unless MySQL supports Materialized Views, which I
don’t think it does (and they’re are not a great solution). Or are
you assuming that the caching will get around this problem, once it’s
built the first time?

Use database views to map the EAV (non-)schema to sane relational
schema.

You didn’t say if you need to insert into that database. If you do, see
if MySQL supports updatable views. If it does, give them a try,
otherwise handle inserts yourself.

I don’t need to do any inserts, so that should not be an issue.

jrq wrote:

On Oct 6, 2:29�pm, Michael S. [email protected] wrote:

try:
I’ve been looking at that. I’m not sure how that gets around the
complex queries, unless MySQL supports Materialized Views, which I
don’t think it does (and they’re are not a great solution). Or are
you assuming that the caching will get around this problem, once it’s
built the first time?

No, views probably won’t help performance, just query construction,
though I don’t know enough about how they’re implemented to be sure.

PostgreSQL handles views much better than MySQL, IIRC, and is generally
a better database. Switch if you can.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Michael S. wrote:

On Wednesday 06 October 2010, jrq wrote:

Yes, I’m currently intending to “munge” the unorganized fields into a
string, and deal with the parsing/processing at another stage. yes,
it’s inefficient, but it’s a relatively small amount of processing,
and hopefully it will get me to where I need to be within a short
amount of time, which is a factor in this situation.

Have you considered using database views? I have never been in this
situation, so take my suggestions with a grain of salt. Here’s what I’d
try:

Use database views to map the EAV (non-)schema to sane relational
schema.

Oh, goodness, I forgot to suggest that. Thanks for the reminder.
rails_sql_views helps with managing those.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Wednesday 06 October 2010, jrq wrote:

situation, so take my suggestions with a grain of salt. Here’s what
I’d

try:
I’ve been looking at that. I’m not sure how that gets around the
complex queries, unless MySQL supports Materialized Views, which I
don’t think it does (and they’re are not a great solution). Or are
you assuming that the caching will get around this problem, once it’s
built the first time?

I don’t know. Give it a try. Define just the views and do some queries.
You wrote earlier that your DB has < 20,000 rows – in other words, it
is almost trivially small and surely fits into memory. That is, unless
the attribute values are extremely large.

I don’t need to do any inserts, so that should not be an issue.

Does that mean that your app does not change the database or that it
does not change at all? Either way, I’d try views first and if that’s
not fast enough already, there’s the option of materializing them
yourself simply by copying the views to real tables.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

I’ve read the rest of the thread with interest, but I just wanted to
go back to the start for a second.

On 6 October 2010 14:52, jrq [email protected] wrote:

I’ve inherited an EAV database and there’s really no option to remodel
it.

Where have you inherited this data from? Is there an application
that’s still populating it? or is it an export of data (from something
like an online survey/questionnaire system)?

Trying to return a meaningful, unified recordset is far too complex to
be efficient.

Using what approach? The data was set up as EAV for a reason, and
would have probably had tools to query and manipulate it - do you not
have any access to the original tools?

I’ve decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
functions to return a fast query result, with the consolidated fields
as one string.
The final data is to be displayed and downloaded (CSV), but not all
fields will be available for each “row”, which makes serialization
awkward.

So what lead you to decide to use Rails for this? Given Rails has no
practical EAV support (just a bit of munging with plugins), it strikes
me as a strange decision to try to use it to analyse your data. Is
there no “EAV data analysis” tool you can find to do the job, rather
than having to create huge amounts of DB views and/or model
manipulation.

I’m not saying to not continue what you’re doing - just curious that
it seems you’ve picked a route that needs more work than should be
reasonable, and don’t seem to be looking at any other options. It’s
always worth considering admitting that you might be making lots of
work for yourself and a new start may be more economical in the long
run.

FWIW Many (6-ish) years ago I got an EAV data export from SurveyMonkey
dumped on my desk, and after importing it into a SQL DB, I produced
Excel cross-tabulated reports with a combination of a big plain old
SQL query (with lots of joins! :slight_smile: to get the data into a “relation”
equivalent, and used Excel’s pivot-table functionality to do the
“heavy lifting” of getting information from those rows.

On Oct 7, 4:50 am, Michael P. [email protected] wrote:

like an online survey/questionnaire system)?
It’s not a static export, and yes there is an application still
populating it.

Trying to return a meaningful, unified recordset is far too complex to
be efficient.

Using what approach? The data was set up as EAV for a reason, and
would have probably had tools to query and manipulate it - do you not
have any access to the original tools?

Using SQL joins to build a recordset that represents the dataset. As
far as I can gauge the data was setup this way to allow an “easy” way
to construct a custom set of fields for a relatively non-technical
user/admin. It certainly wasn’t designed to handle the reporting
function I’m being asked to create, or, I suspect the volume of data
that is being stored in it.

Yes, the right thing to do would be to replace large chunks of this
and design a better normalized data solution that satisfied all the
systems involved. Unfortunately, in this instance I don’t have the
luxury of time or money to go that route. I wouldn’t describe myself
as a experienced developer, but I have been in this business for a
long time, and I know the kind of hole I’m digging for myself by
choosing this method.

there no “EAV data analysis” tool you can find to do the job, rather
than having to create huge amounts of DB views and/or model
manipulation.

It’s more complex (isn’t it always). A large part of the database is
well-formed, normalized and friendly. This little area that only
recently came to my attention is not (EAV). Rather than re-tool my
entire solution design, I’m sticking to the original plan, and shoe-
horn this section in (and it is a relatively small section in the
scheme of things).

I freely admit that Rails is not the best solution for this. I’d even
go as far as to say that it falls into the category of “right now I
have a hammer, so everything looks like a nail”, by which I mean: I
don’t do a lot of development, especially web-app development, and
spent some time at the beginning of the year learning RoR. I liked
it, even if I wasn’t always using it in the manner for which it’s most
suitable. It’s been suggested that something like “Sinatra” would be
a much better fit for this, and quick peek seems to show that it
probably would. However, in the short time available to complete this
project, I can’t deal with struggling to get through all the “gotchas”
of another framework.

Lame excuses, I know. Completely at odds with the dogma of RoR
development.

Excel cross-tabulated reports with a combination of a big plain old
SQL query (with lots of joins! :slight_smile: to get the data into a “relation”
equivalent, and used Excel’s pivot-table functionality to do the
“heavy lifting” of getting information from those rows.

With data migration, I generally opt to get down to the raw data and
manipulate it all in code. I know it can be a little less efficient,
but when it’s my code, at least I know I can trust it, or at least fix
it when it breaks. Having said that, the pivot table stuff in Excel
can be pretty useful.

On 7 October 2010 13:15, jrq [email protected] wrote:

On Oct 7, 4:50 am, Michael P. [email protected] wrote:

I’ve read the rest of the thread with interest, but I just wanted to
go back to the start for a second.

Okay, well, given you want to try to make your life easier in a very
special and specific use-case, and I can see a bit more about the
angle you’re coming at this problem; If you’re going to
programmatically solve this in a potentially quick-and-dirty way,
because the powers-that-be must be demanding the “reports”, try seeing
if OpenStructs [1] will help you build up collections of
pseudo–relational-records made up from the EAV records. They’re a
nice solution for creating “something” that behaves like an AR object
for the majority of time - you could even extend one of the “tableless
model” [2] implementations to keep more standards in your interfaces.

[1]
http://ruby-doc.org/stdlib/libdoc/ostruct/rdoc/classes/OpenStruct.html

[2]

or Google for others…

On Oct 7, 4:19 am, Michael S. [email protected] wrote:

need to be within a short amount of time, which is a factor in
you assuming that the caching will get around this problem, once it’s
does not change at all? Either way, I’d try views first and if that’s
not fast enough already, there’s the option of materializing them
yourself simply by copying the views to real tables.

My app will not change the tables involved. However, there are other
apps that do change this data. This is not a high-volume transaction
system, so faking a materialized view may prove to be an option.

Interesting. OpenStruct might very well come in handy. Thanks.