Cassandra Architecture, Features and Operations

    1 Votes

Cassandra is a NoSQL database which is designed in a manner to handle large amount of data present across several nodes in cluster setup. It is a distributed master-less database. It comes under Key-Value type NoSQL storage, which provides schema less model. In this the data will be stored in the Key-Value format.

All the data within the DB consists of an indexed key and a value. Key-Value Data model provides high performance, high scalability and high flexibility, compared to other NoSQL storages. Even though the functionality varies, there is no complexity in this key-value type. Cassandra have a feature to handle large amount of data without any single point of failure.

In a typical master-less Cassandra architecture, data is going to get distributed among all the nodes present in cluster. If data is going to get stored in commodity servers in huge volumes, Cassandra is the best fit.

Cassandra Architecture

Cassandra Architecture

The above figure will give a clear picture about the architecture, nodes present and interconnection between them. From the above figure, we can make following observations. Figure represents the typical cluster model of Cassandra.

  • All the nodes that present in the cluster will play the same role.
  • Each node is independent and interconnected to each other node.
  • Each node can accept read and write requests from client.
  • In cluster scenario if a node goes down, read and write requests can be communicate from the other nodes in the cluster.
  • Nodes can communicate with each other using Gossip protocol.

The actual data is going to get stored in the nodes. Each node performs replication of data that it is being stored in the nodes. The collection of related nodes is known as data centre. In a typical cluster setup, it contains one or more data centers. The above figure shows only one data centre, but in a development setup we can have more than one data centers. In traditional databases, crash recovery mechanism is going to be tough task whereas in Cassandra it provides unique crash recovery mechanism.

Write operations

Cassandra provides Commit log where each write operation is going to write to it. Mem-table is another component present in Cassandra, which provides memory-resident data structure. After commit log, the data will be written into mem-table. Mem-table has its threshold value and when the data reaches the threshold, it will be flushed into SSTable. SStable is a disk file present in Cassandra. Bloom filter are acceded after each and every query.

Read operations

Cassandra get values from mem-table and check the bloom filter to find the SSTable that holds the required data. We have CQL to interact with the Cassandra database. CQL stands for Cassandra query language. Client can access Cassandra database through its nodes using CQL. Client may approach any of the nodes present in the cluster for read and write operations.

Cassandra Features

The above figure shows some of the important features of Cassandra. We will discuss about those features.

Traditional relational models having its own master slave architecture models, handles base case scenario when it comes to availability. But, Cassandra provides masterless architecture which enables continuous availability for the client applications. As the number of nodes increase in the cluster set up will in turn increases performance throughput. So here Performance throughput is directly proportional to number of nodes present in the cluster.

Cassandra Operations

The above figure shows the operations performed by Cassandra. The main operations are transparent fault detection and recovery which helps Cassandra to replace nodes that failed to be restored or replaced quickly. In a typical distributed cluster environment Tunable Data consistency feature provides better consistency.

Data modeling in Cassandra architecture

Cassandra Data Modeling

Cassandra implements a Dynamo-style replication model with no single point of failure. It deals with unstructured data and has flexible schema. Key space and column families are present in Cassandra architecture for data modeling. Column family is a ordered collection of rows and Key space is collection of column families.

Each column family is having rows. Rows contain columns and values. This is the typical hierarchy level storage in Cassandra. In Cassandra cluster set up we can have multiple key spaces and each key space can have multiple column families present in it.

Cassandra Column Family

If we observe figure 4, each column family is having multiple rows present in it. Figure 5 represents each column family in deep. In Each Row we have multiple key- value pairs present in it. Here the key will be columns and value is different values.

We can understand this clearly by the following example (Reference with Cassandra: The Definitive Guide)

Example 1

It may help to think of it in terms of JavaScript Object Notation (JSON) instead of a Picture:

Musician:ColumnFamily 1
Bootsy: RowKey1
Email: This email address is being protected from spambots. You need JavaScript enabled to view it. Column Name: Value
Instrument: bass Column Name: Value
George: RowKey2
Email: This email address is being protected from spambots. You need JavaScript enabled to view it. Column Name: Value
   
Band: ColumnFamily 2
George: RowKey1
Pfunk: 1968-2010 Column Name: Value

The above example describes about musician and its related band. If we observe closely it’s having two main column families. In column family 1, it’s having two row keys and in that it’s having column names and its values. Same way, data is going to get stored in Cassandra. It’s having nested structures inside of each key space. A single key space can have one or more column families present in it.

Example 2

Structure: [Keyspace][ColumnFamily][RowKey][Column][Hotelier][Hotel][RowKey][ColumnValues]

We can use a JSON-like notation to represent a Hotel column family, as shown here:

Hotel {
key: AZC_043 { name: Cambria Suites Hayden, phone: 480-444-4444,
address: 400 N. Hayden Rd., city: Scottsdale, state: AZ, zip: 85255}
key: AZS_011 { name: Clarion Scottsdale Peak, phone: 480-333-3333,
address: 3000 N. Scottsdale Rd, city: Scottsdale, state: AZ, zip: 85255}
key: CAS_021 { name: W Hotel, phone: 415-222-2222,
address: 181 3rd Street, city: San Francisco, state: CA, zip: 94103}
key: NYN_042 { name: Waldorf Hotel, phone: 212-555-5555,
address: 301 Park Ave, city: New York, state: NY, zip: 10019}
}

We can apply query on the above example like this to fetch the required results.

cassandra> get Hotelier.Hotel['NYN_042']

It will fetch all the column values for the NYN_042

Conclusion

Most of the e-commerce industries using Cassandra as their database storage. By its faster querying on large amount of data these industries using Cassandra as their best fit for their business solution.