Apache Cassandra
Cassandra
|
|
---|---|
Basic data
|
|
developer | Apache Software Foundation |
Publishing year | 2008, April 10, 2012 |
Current version | 3.11.5 January 14, 2020 3.11.1 3.0.15 ( LTS ) 2.2.11 (LTS) Security |
operating system | cross-platform |
programming language | Java |
category | Database management system |
License | Apache |
German speaking | No |
cassandra.apache.org |
Cassandra is a simple, distributed database management system for very large structured databases (a so-called " NoSQL " database system). It is designed for high scalability and reliability in large, distributed systems. The data are stored in key- value relationships . It is openly documented and implemented in Java . The implementation is distributed as free software under the terms of version 2 of the Apache license .
history
Cassandra was originally developed by Avinash Lakshman (one of the authors of Amazons Dynamo ) and Prashant Malik on Facebook for the inbox search problem there and released in July 2008. After that, other big companies like IBM , Rackspace, and Twitter also contributed to the code. The project was accepted as a sub-project in the Apache Incubator at the Apache Software Foundation in March 2009 . On February 17, 2010, Cassandra was declared a "top-level" project by the Apache Software Foundation and is therefore no longer a sub-project of Apache Incubator. Version 0.8, released on June 2, 2011, introduces the Cassandra Query Language (CQL) , a query language with SQL -like syntax.
concept
Cassandra is a columnar NoSQL database. Partitioning, on the other hand, is line-based. It can be seen as a mixture of Amazon Dynamo and Bigtable , as it uses the replication mechanisms of Dynamo in a slightly further developed form, but at the same time offers the data structure of Bigtable to the outside world.
use
Cassandra is used on Apple , Twitter , Digg , Spotify and Reddit . It also served hundreds of millions of members on Facebook by mid-2011 (replaced by a combination of HBase , HDFS and Haystack since July 2011 ). Cassandra is the most popular columnar NoSQL database.
Main features
- Distributed
- Each node in the cluster has the same role. There is not one point of failure. The data is distributed across the cluster (so each node contains different data). There is no master; every node can service every request.
- Supports replication and multi data center replication
- The replication strategies are configurable. The main features of Cassandra's distributed architecture are specifically tailored for delivery through multiple data centers, as well as redundancy, failover and disaster recovery.
- Scalability
- Designed so that both read and write throughput increase linearly as new machines are added. The goal is that the applications do not experience any downtime or interruptions.
- Fault tolerance
- Data is automatically replicated to multiple nodes for fault tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced without downtime.
- Adjustable consistency
- Cassandra is typically classified as an AP system. This means that availability and partition tolerance are generally believed to be more important than consistency in Cassandra. Read and write offer an adjustable level of consistency, from “writing never fails” to “all replicas are blocked from being read” with a medium quorum level.
- MapReduce support
- Cassandra has Hadoop integration with MapReduce support. Cassandra also supports Apache Pig and Apache Hive.
- Query language
- Cassandra introduced the Cassandra Query Language (CQL). CQL is a simple interface to access Cassandra as an alternative to the traditional Structured Query Language Structured Query Language (SQL).
- Eventual consistency
- Cassandra controls the eventual consistency of reads, upserts and deletions through tombstones.
Web links
- Official website (English)
- Original paper presented by Cassandra (PDF; 133 kB)
- Article about Cassandra by Jochen Schnelle in "Free Magazine", issue 09/2011
- HBase vs Cassandra: why we moved - Dominic Williams (FightMyMonster.com): blog entry showing some features of Cassandra (especially compared to HBase)
Individual evidence
- ↑ projects.apache.org . (accessed on April 8, 2020).
- ↑ a b c d e Downloading Cassandra. In: apache.org. Retrieved June 4, 2019 .
- ↑ Avinash Lakshman: Cassandra - A structured storage system on a P2P network. In: Facebook . August 25, 2008, accessed August 17, 2017 .
- ↑ Jonathan Ellis: The Cassandra Project ( January 30, 2011 memento in the Internet Archive ).
- ↑ Matthieu Riou: Cassandra is an Apache top level project. In: mail-archive.com. February 18, 2010, accessed March 13, 2017 .
- ↑ a b Oliver Diedrich: NoSQL database Cassandra in version 0.8. In: Heise online . June 6, 2011, accessed March 11, 2016 .
- ↑ CloudKit: Structured Storage for Mobile Applications , Shraer et al., Proceedings of the VLDB Endowment, Vol. 11, No. 5, 2018.
- ↑ Looking to the future with Cassandra ( Memento from September 12, 2009 in the Internet Archive ) (English).
- ↑ Gösta Forsum: backend infrastructure at Spotify. In: Labs. March 15, 2013, accessed August 27, 2019 .
- ↑ James Hamilton: Storage Infrastructure Behind Facebook Messages. In: mvdirona.com. October 2011, accessed on March 13, 2017 .
- ↑ DB-Engines Ranking of Wide Column Stores
- ↑ Deploying Cassandra across Multiple Data Centers. In: DataStax. Retrieved December 11, 2014 .
- ^ The CAP Theorem - Learn Cassandra. In: teddyma.gitbooks.io. Retrieved May 13, 2020 .