Clearly shown on the above diagram, Big Data, especially from the view of those NoSQL database, is the problem that has very large data volume and need relatively simple algorithm for analysis.
In other words, according to the above definition, it may not be called as Big Data problem if the problem can not be solved with simple algorithms (NoSQL and map/reduce ?). To help myself to have clear clue,I make a diagram as below,
I classify database techniques into four groups, which map to the "problem types" diagram.
- Structural database I think those traditional RDBMSs should still be used in operational database, which asks for good support for transaction process. If the data volume is small, it will be ok to use RDBMS in data warehouse application as well. After all, structure database has been developed for decades to deal with even table that have to be partitioned.
- Quant If advanced mathematics knowledge is required, quant analysis is demanded.
- NoSQL NoSQL becomes popular as developer find that for some problems, they can dramatically speed up data I/O operation if they model the data in an nonstructural way, which against the rule of normalization in a RDBMS. Although traditional RDBMS also has techniques like materialized table, materialized view, and proxy table, they all have limitation and are all still behind SQL optimization engine.So, why dont we just directly got to hit data storage engine if we do not need SQL at all. Then, NoSQL comes into the stage.
- Hybrid In fact, besides those popular NoSQL database like MongoDB, coutchDB, HBase etc, those traditional RDBMS vendors are also developing their NoSQL products too. For examples, MySQL cluster has key/value memcached and sockethandler that avoid overhead SQL stuff. In a real project, which needs to handle big data volume and transaction as well, I believe both structural database and NoSQL database are needed.
I always try to be careful not to be fooled by marketing language. NoSQL and RDBMS should have their own position in a large system, where hybrid solution is required. We need to choose right tools for different tasks.