In my previous post An Introduction to Cassandra, I briefly wrote about core features of Cassandra. In this post, I will talk about Consistent Hashing and it’s role in Cassandra.
Cassandra is designed as a peer-to-peer system. To make the system highly available and to eliminate or to reduce the hot-spots in network, data has to be spread across multiple nodes. Cassandra uses partitioning to distribute data in a way that it is meaningful and can later be used for any processing needs. Partitioning distributes data in multiple nodes of Cassandra database to store data for storage reason.
By choosing the right partitioning strategy, we would like to achieve
- Data is well distributed throughout the set of nodes
- When a node is added or removed from set of nodes the expected fraction of objects that must be moved to a new node is the minimum needed to maintain a balanced load across the nodes
- System should aware which node is responsible for a particular data.
If we have a collection of n nodes then a common way of load balancing across them is to put object ‘O’ in node number hash(O) mod n. This works well until we add or remove nodes. When the range of the hash function ( in the example, n) changed, almost every item would be hashed to a new location. Suddenly, all data is useless because clients are looking for it in a different location. This problem is solved by consistent hashing – consistently maps objects to the same node, as far as is possible, at least.
The basic idea behind the consistent hashing algorithm is to hash both objects and nodes using the same hash function. The reason to do this is to map the node to an interval, which will contain a number of object hashes. If the node is removed then its interval is taken over by a node with an adjacent interval. All the other nodes remain unchanged.
Consider the hashCode method on Java Object returns an int, which lies in the range -2^31 to 2^31 -1. Visualize this range into a circle so the values wrap around. From the circle as show below, It has 5 objects (1, 2, 3, 4,5) that are mapped to (A, B, C, D) nodes. Each position in the circle represents hashCode value.
To find which node an object goes in, we move clockwise round the circle until we find a node point. So in the diagram above, we see object 1 and 4 belong in node A, object 2 belongs in node B, object 5 belongs in node C and object 3 belongs in node D. Consider what happens if node C is removed: object 5 now belongs in node D, and all the other object mappings are unchanged. If then another node E is added in the position marked it will take object 4, leaving only object 1 belonging to A.
There are chances that distribution of nodes over the ring is not uniform. It would overload some of the nodes in the system. How can we balance load across all nodes? This can be ameliorated by adding each server node to the ring a number of times in different places. This is achieved by having a num_tokens, which applies to all servers in the ring, and when adding a server, looping from 0 to the num_tokens – 1, and hashing a string made from both the server and the loop variable to produce the position. This has the effect of distributing the servers more evenly over the ring. The new paradigm is called virtual nodes.
<Image credits to authors of Apache Cassandra documentation:data_distribution>
The top portion of the graphic shows a cluster without virtual nodes. In this paradigm, each node is assigned a single token that represents a location in the ring. Each node stores data determined by mapping the row key to a token value within a range from the previous node to its assigned value. Each node also contains copies of each row from other nodes in the cluster. For example, range E replicates to nodes 5, 6, and 1. Notice that a node owns exactly one contiguous partition range in the ring space.
The bottom portion of the graphic shows a ring with virtual nodes. Within a cluster, virtual nodes are randomly selected and non-contiguous. The placement of a row is determined by the hash of the row key within many smaller partition ranges belonging to each node.
Here’s another graphic showing the basic idea of consistent hashing with virtual nodes, courtesy of Basho.
Cassandra adopts consistent hashing with virtual nodes for data partitioning as one of the strategies.
- ConsistenHashingandRandomTrees DistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf