The Tempest in the Database

A bold prediction for 2010 would be that the database sector is the one that’s most susceptible to disruption by open source software in the next five years. You may be thinking that statement either makes me clairvoyant or foolhardy. Or if you’ve been following the “Future of Open Source” surveys from North Bridge over the past few years you could be thinking that I cribbed that prediction from last year. After all, 52 percent of the respondents agreed. Of course I write this post without any knowledge of the 2010 survey results so I may be going out on a limb here but I stand firm in my prediction.

While I’d make a lousy psychic I’m not too shabby when it comes to determining where a particular part of the industry is headed. I base my prediction about the database sector on a careful analysis of the past. After all, as Shakespeare so eloquently put it in The Tempest, ‘what’s past is prologue.’

Here’s why. Let’s flashback to June 1970 when Edgar Codd first published “A Relational Model of Data for Large Shared Data Banks.” This paper forever changed the IT landscape and while there have been updates and amendments to the model throughout the years, this generation has yet to see a disruption of that proportion. Because Codd was right and his model worked.

What couldn’t have been predicted at the time, however, was the deluge of information that we’re currently struggling to process. In an IDC white paper sponsored by EMC, the prediction is that the amount of digital information produced in 2011 will equal nearly 1800 exabytes. This is 10 times the amount produced in 2006. To put this in perspective, in 2006, 161 exabytes of data were created. That’s three million times the amount of information contained in all the books ever written.

Processing this information is overloading our systems (see this Special Report by The Economist.) It’s costing us a fortune in maintenance. The existing vendor community is locking us into nonsensical one-size fits all options. We’ve taken what was once a simple act of accessing information and making it a daily feat of epic proportions.

It seems to me that it should it be more about the data and less about the database when you’re trying to make business decisions.

Yet here we are 40 years later and still relying on that same relational model to a large extent. Of the $20B annual revenue in the database industry, more than 95% is driven by a small group of commercial database vendors (Oracle, IBM, Microsoft, Teradata, etc.). Are these companies motivated to deliver a new model that deals with the challenges I outlined above? Or do they see it as a way to lock their customers into ever increasing subscription fees?

The search for new solutions will be very much developer driven, and so lends itself very well to an open source approach. We already see this playing out in the various NoSQL projects. But much more work is needed to ensure that the new database approaches mesh up with programming models in the Java world, support existing web applications, transactional SQL systems, and more.

MySQL now is in Oracle’s fold. Will it still be allowed to provide innovation leadership? If not, other open source leaders will need to emerge. It won’t be easy but do we really have an option?

(this post has also been published on the blog of Akiban Technologies)

Add new comment