There has been a lot of talk about NoSQL vs SQL. People talk as if NoSQL is just one database. It is not, it is a whole slew of databases that do not use SQL as its querying language. But more importantly there are SQL based datastores like VoltDB and NimbusDB that do not have the same operational characteristics of a SQL datastore as imagined by the proponents of these systems.
There are several aspects to consider before selecting a database solution for your app. In fact, it has come to point that if your application is sufficiently complex, then you would have to use more than one database for your app depending on the kind of problem you would like to solve. To be clear, databases like MySQL and Postgres have served us well for the larger part of this century and will continue to do so. The class of problems that require a different datastore are statistically outliers. But many of the problems that were eariler solved with standard databases with intelligent hacks can now be solved with a wide variety of NoSQL databases.
Before understanding the why of each database, you need to understand the CAP theorem.
The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
- Consistency (all nodes see the same data at the same time)
- Availability (node failures do not prevent survivors from continuing to operate)
- Partition Tolerance (the system continues to operate despite arbitrary message loss)
According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three. (from wikipedia)
Most of the NoSQL solutions focus on the AP slice instead of consistency. However, databases like VoltDB focus on CA. HBase chooses CP and so on and so forth. I will not bore you with the details but you can read all about the charecteristics of your favorite database at http://nosql.org.
But you need to understand your requirement before you can pick your database. Most of the SQL databases wait for a 2-phase commit. It is a blocking call. If you have a website or a solution with hundreds/thousands of users with only a few number of concurrent connections, you might still be fine. However, you have to understand that each write is basically a blocking operation. So, if you have an application that is write intensive, then you pretty much have to wait on the database to write to the disk and update the indices for every request. This is the case for most apps. Since, we are designing apps which we can control, now organizations can harvest a lot of data like usage patterns and such. Writing such data to a SQL database during the request is wasteful and considered extremely harmful for your user experience. Now it is easy to understand why so many NoSQL systems like CouchDB, MongoDB and others sacrifice on consistency and focus on availability and performance.
One of the other things that you would want to choose a NoSQL system is for its flexibility with schema design. Assume that you are running a web app where you would want users to store custom fields for the same data model. Usually you will have to provision a database for each customer as their schema would change. Or, if you are using a slightly more evolved database like Postgres, you can use schemas and schema search paths. But then you cannot collate results across different schemas or different customer databases. Depending on your requirement, you may or you may not have to. But using a NoSQL datastore like CouchDB or MongoDB, you can have a flexible schema.
The third thing about NoSQL systems is the ability to perform DB level operations in a distributed manner. Yes I am talking about the elephant in the room - Map/Reduce. Some of the NoSQL databases include a computational layer that can scale and run in a distributed manner across many machines. Databases like CouchDB push it to the extreme while a few like MongoDB use it in moderation. But this is an important piece nevertheless. You can always replicate this by rolling out your hadoop cluster. But I feel going the DB way might be more efficient. After all, the data is already there.
All in all, I believe that NoSQL and SQL both have its uses. It is no more NoSQL vs SQL it is NoSQL and SQL. In fact, I am looking forward to new application frameworks that support polyglot persistence.