Posts Tagged ‘oltp’

The “NoSQL” dispute: A performance argument

December 10, 2009 1 comment

NoSQL is a database movement that began in early to mid 2009 and which promotes non-relational data stores that do not need a fixed schema, and that usually avoid join operations. From [1]:

Next Generation Databases mostly address some of the points: being non-relational, distributed, open-source and horizontal scalable. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, replication support, easy API, eventually consistency, and more. So the misleading term “nosql” (some call it “not only sql”) should be seen as an alias to something like the definition above …

I just read a very insightful article from Michael Stonebraker [2], one of the pioneers of modern database theory and Relational Database Management Systems (RDBMS) and “father” of systems like Ingres and Postgres [3, 4]: ‘The “NoSQL” Discussion has Nothing to Do With SQL‘ [5]. In this article, published in the blog of the Communications of the ACM, Dr. Stonebraker responds to some of the arguments from the supporters of the NoSQL movement. There are two possible reasons to move away from structured Relational Database Management Systems and adopt an alternate DBMS technology: performance and flexibility:

The performance argument goes something like the following. I started with MySQL for my data storage needs and over time found performance to be inadequate. My options were: … 2. Abandon MySQL and pay big licensing fees for an enterprise SQL DBMS or move to something other than a SQL DBMS.

The flexibility argument goes something like the following. My data does not conform to a rigid relational schema. Hence, I can’t be bound by the structure of a RDBMS and need something more flexible. ….

Focusing on the performance argument, he explains what is more or less a common knowledge in the database community: The major performance burden of modern RDBMS  comes from all those extra features like transaction possessing (especially insuring the ACID properties), logging, etc and not from the core engine for executing an SQL query:

… However, the net-net is that the single-node performance of a NoSQL, disk-based, non-ACID, multithreaded system is limited to be a modest factor faster than a well-designed stored-procedure SQL OLTP engine. In essence, ACID transactions are jettisoned for a modest performance boost, and this performance boost has nothing to do with SQL.

In summary, blinding performance depends on removing overhead. Such overhead has nothing to do with SQL, but instead revolves around traditional implementations of ACID transactions, multi-threading, and disk management. To go wildly faster, one must remove all four sources of overhead, discussed above. This is possible in either a SQL context or some other context. …

I believe that anyone interested in the inner workings of modern RDBMS should read this short post from Dr. Stonebraker and continue with a very interesting paper from Harizopoulos, “OLTP through the looking glass, and what we found there” [6]. I am getting tired of discussing with people outside the database community who after writing a couple of SQL queries think that they know all about modern RDBMS and insist that the MySQL’s MYISAM engine is the best engine out there just because it is fast(er) or question my motives when I consult them that a “Lite SQL” (no reference to any real product)  solution is just not enough..

PS. Dr. Stonebraker is not just an exceptional database researcher, but a visionary whose ideas have shaped the modern Database landscape. A short abstract from an article in SIGMOD Record when he received the IEEE John von Neumann Medal [7]:

… The relational data model and its associated benefits of “data independence” and “non-procedural access” were first invented by Tedd Codd. However, more than any other individual, Mike is responsible for making Codd’s vision of independence a reality through the architectures and algorithms embodied in the series of open-source prototypes and commercial systems that he has initiated and led. While many others have certainly made important contributions to the field, no one else comes close to his continuous sequence of landmark innovations over a period of almost 30 years.

… Mike has been the primary driving force behind major shifts in the research agenda of the database community, including two occasions where he launched entire new directions for the field to follow (the implementation of relational systems and object-relational systems).

Categories: Databases Tags: ,