NoSQL Movement

To deal with the increase in concurrent users (Big Users) and the amount of data (Big Data), applications and their underlying databases need to scale using one of two choices: scale up or scale out. Scaling up implies a centralized approach that relies on bigger and bigger servers. Scaling out implies a distributed approach that leverages many standard, commodity physical or virtual servers.

Prior to NoSQL databases, the default approach was to scale up. This was dictated by the fundamentally centralized, shared-everything architecture of relational database technology. To support more concurrent users and/or store more data, we need a bigger and bigger server with more CPUs, more memory, and more disk storage to keep all the tables.

NoSQL databases were developed from the ground up to be distributed, scale out databases. They use a cluster of standard, physical or virtual servers to store data and support database operations. To scale, additional servers are joined to the cluster and the data and database operations are spread across the larger cluster.

NoSQL databases provide a much easier, linear approach to database scaling at low cost of commodity servers. If 10,000 new users start using your application, simply add another database server to your cluster. Add ten thousand more users and add another server. There’s no need to modify the application as you scale since the application always sees a single (distributed) database. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failure making them highly reliable.

NoSQL database is defined as Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable

There are around 150 NoSQL systems available at present in the market. Each One comes with its own merits and demerits. This post briefly talks about Cassandra and it’s core features.

Cassandra

Cassandra stands out as one of the best available NoSQL databases because of it’s core features as stated below.

Scalability

Cassandra is an example of ideal horizontally scalable (scale out) system by allowing for seamless addition of nodes. As you need more capacity, you add nodes to the cluster and the cluster will utilize the new resources automatically. With this amount of flexibility, one can deploy efficiently on commodity servers or cloud based infrastructure.

Highly Available

Cassandra designed to handle very large amounts of data spread across many commodity servers with no single point of failure. Every node is same and no node is assigned special responsibilities. Data is distributed across the cluster (so each node contains different data), but there is no master as every node can service any request.

Replication

Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment, for redundancy, for fail-over and disaster recovery. Replication strategies are configurable.

Fault-tolerant

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

Tunable consistency

In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For any given read or write operation, the client application decides how consistent the requested data should be. In addition to tunable consistency, Cassandra has a number of built-in repair mechanisms to ensure that data remains consistent across replicas.

Read/Write Performace

Cassandra reads, as well as writes, data by primary key, eliminating complex queries required by a relational database. Cassandra is optimized for very fast and highly available data writing.

Flexible schema

Cassandra is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple data centers and the cloud.

Query Language

CQL (Cassandra Query Language) was introduced, a SQL-like alternative to the traditional RPC interface. Drivers are available for Python, PHP, Ruby, Node.js, and JDBC-based client programs

References

  • http://www.datastax.com/dev/blog/why-does-scalability-matter-and-how-does-cassandra-scale
  • http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  • http://en.wikipedia.org/wiki/Apache_Cassandra
  • http://www.datastax.com/docs/0.8/dml/data_consistency
  • http://www.datastax.com/docs/0.8/dml/about_writes
  • http://nosql-database.org