Apache Cassandra

from Wikipedia, the free encyclopedia
Cassandra

logo
Basic data

developer Apache Software Foundation
Publishing year 2008, April 10, 2012
Current  version 3.11.5
January 14, 2020

3.11.1
October 10, 2017

3.0.15 ( LTS )
October 10, 2017

2.2.11 (LTS)
October 5, 2017

Security
fixes only: 2.1.19 (LTS)
October 5, 2017

operating system cross-platform
programming language Java
category Database management system
License Apache
German speaking No
cassandra.apache.org

Cassandra is a simple, distributed database management system for very large structured databases (a so-called " NoSQL " database system). It is designed for high scalability and reliability in large, distributed systems. The data are stored in key- value relationships . It is openly documented and implemented in Java . The implementation is distributed as free software under the terms of version 2 of the Apache license .

history

Cassandra was originally developed by Avinash Lakshman (one of the authors of Amazons Dynamo ) and Prashant Malik on Facebook for the inbox search problem there and released in July 2008. After that, other big companies like IBM , Rackspace, and Twitter also contributed to the code. The project was accepted as a sub-project in the Apache Incubator at the Apache Software Foundation in March 2009 . On February 17, 2010, Cassandra was declared a "top-level" project by the Apache Software Foundation and is therefore no longer a sub-project of Apache Incubator. Version 0.8, released on June 2, 2011, introduces the Cassandra Query Language (CQL) , a query language with SQL -like syntax.

concept

Cassandra is a columnar NoSQL database. Partitioning, on the other hand, is line-based. It can be seen as a mixture of Amazon Dynamo and Bigtable , as it uses the replication mechanisms of Dynamo in a slightly further developed form, but at the same time offers the data structure of Bigtable to the outside world.

use

Cassandra is used on Apple , Twitter , Digg , Spotify and Reddit . It also served hundreds of millions of members on Facebook by mid-2011 (replaced by a combination of HBase , HDFS and Haystack since July 2011 ). Cassandra is the most popular columnar NoSQL database.

Main features

Distributed
Each node in the cluster has the same role. There is not one point of failure. The data is distributed across the cluster (so each node contains different data). There is no master; every node can service every request.
Supports replication and multi data center replication
The replication strategies are configurable. The main features of Cassandra's distributed architecture are specifically tailored for delivery through multiple data centers, as well as redundancy, failover and disaster recovery.
Scalability
Designed so that both read and write throughput increase linearly as new machines are added. The goal is that the applications do not experience any downtime or interruptions.
Fault tolerance
Data is automatically replicated to multiple nodes for fault tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced without downtime.
Adjustable consistency
Cassandra is typically classified as an AP system. This means that availability and partition tolerance are generally believed to be more important than consistency in Cassandra. Read and write offer an adjustable level of consistency, from “writing never fails” to “all replicas are blocked from being read” with a medium quorum level.
MapReduce support
Cassandra has Hadoop integration with MapReduce support. Cassandra also supports Apache Pig and Apache Hive.
Query language
Cassandra introduced the Cassandra Query Language (CQL). CQL is a simple interface to access Cassandra as an alternative to the traditional Structured Query Language Structured Query Language (SQL).
Eventual consistency
Cassandra controls the eventual consistency of reads, upserts and deletions through tombstones.

Web links

Individual evidence

  1. projects.apache.org . (accessed on April 8, 2020).
  2. a b c d e Downloading Cassandra. In: apache.org. Retrieved June 4, 2019 .
  3. Avinash Lakshman: Cassandra - A structured storage system on a P2P network. In: Facebook . August 25, 2008, accessed August 17, 2017 .
  4. Jonathan Ellis: The Cassandra Project ( January 30, 2011 memento in the Internet Archive ).
  5. Matthieu Riou: Cassandra is an Apache top level project. In: mail-archive.com. February 18, 2010, accessed March 13, 2017 .
  6. a b Oliver Diedrich: NoSQL database Cassandra in version 0.8. In: Heise online . June 6, 2011, accessed March 11, 2016 .
  7. CloudKit: Structured Storage for Mobile Applications , Shraer et al., Proceedings of the VLDB Endowment, Vol. 11, No. 5, 2018.
  8. Looking to the future with Cassandra ( Memento from September 12, 2009 in the Internet Archive ) (English).
  9. Gösta Forsum: backend infrastructure at Spotify. In: Labs. March 15, 2013, accessed August 27, 2019 .
  10. James Hamilton: Storage Infrastructure Behind Facebook Messages. In: mvdirona.com. October 2011, accessed on March 13, 2017 .
  11. DB-Engines Ranking of Wide Column Stores
  12. Deploying Cassandra across Multiple Data Centers. In: DataStax. Retrieved December 11, 2014 .
  13. ^ The CAP Theorem - Learn Cassandra. In: teddyma.gitbooks.io. Retrieved May 13, 2020 .