EAV Modeling – Square Peg in a Round Hole?

So I got the June 2008 issue of php|architect (or volume 7 issue 6 for those of you who track it that way) in recently. Right off, I found the cover article on EAV modeling to be of interest seeing as my current employment is in the medical IT industry and I’d never heard of this technique for storing data. I actually more or less knew what it was, but had never put a name to the face so to speak.

The mental image that came to me when reading about this approach to data modeling was taking the traditional relational table and turning it on its head. Despite what the Wikipedia article on the topic might tout early on, there are disadvantages to using the EAV approach. EAV actually has to circumvent, work around, or reimplement features that most mainstream database servers today provide “for free” to the traditional relational counterparts of EAV in order to get equal functionality. These include native data type validation and data type-specific operations without explicit typecasting (if you’re not separating EAV values by data type), row-level referential integrity, and schema metadata. EAV also adds a dimension of complexity to query construction in an era where storage is becoming cheaper and database technologies are evolving. It may work, but I don’t foresee it scaling very well for larger systems. In short, it seems an attempt to force a square peg into the round hole that are traditional relational database systems.

In a MySQL world, there are alternative approaches for deploying DDL modifications. One is to implement master-slave replication to propagate DDL modifications and load balancing to maintain uptime as changes are propagated from server to server. Another is to use MySQL Proxy to direct queries to servers hosting unmodified schemas and queue DML operations in the binary log while DDL modifications are made. Once DDL is complete, the server goes into “read only” mode while queued DML operations are applied and incoming DML operations received during that time are blocked until the queue is empty. (This may be a potential point of improvement.)

Outside of MySQL, there are document-focused database systems such as Apache Lucene and its current .NET and PHP ports as well as Apache CouchDB. While some of these are still a little early in development, I see them as being more ideal for applications demanding more fluid data storage and hope that the development of similar solutions continues.

4 Comments

  1. Davey Shafik says:

    Using the EAV model is one of the reasons Magento is so slow; another major reason is (if I recall correctly) that it loads some 330 classes just for the main page — in part because of it’s reliance on this model.

    – Davey

  2. …title=”View all posts in Development” rel=”category tag”>Development Matthew Turland has posted about a topic that was the subject of a recent cover…

  3. Thomas Koch says:

    Hi Matthew,

    thanks for raising this interesting topic. I’m currently working on a content storage component which should be as flexible as EAV, but avoid its disadvantages.

    I’d be glad if you’d like to have a look at my blogpost about it:
    http://www.koch.ro/blog/index.php?/archives/81-My-content-storage-component-project.html

    (This is my second try to post a comment. The first raised an error message “action not allowed”. Maybe because I included link tags)

  4. Bill Karwin says:

    Thanks for this blog post, Matthew. I was also dismayed to see EAV featured on the cover of the PHP Architect magazine. How many PHP developers will be attracted to this antipattern as a result?

    You might be interested in reading another story about a business that was practically destroyed because the DB architect insisted on using an EAV-style model:

    http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/