Apache Cassandra is a free and open-source NoSQL database with high performance, scalability, and availability. It is designed to handle large amounts of data across hundreds of machines in a single cluster. It offers both synchronous and asynchronous replication and batch processing on top of a columnar storage engine. Cassandra is written entirely in Java and scales on commodity hardware without requiring a complex setup or administration.
Facebook and the Apache Software Foundation developed Cassandra. It was founded and originally designed by Facebook engineers, with help from Anonymous, who got involved in the project as part of his work on the Hadoop project. To make it easier for other organizations to use Cassandra for their projects, Facebook donated the Cassandra codebase to the Apache Software Foundation, which assigned development responsibilities to a community of committers and developers from around the world. In November 2010, Facebook announced it had contributed “100% of Cassandra’s code” to an open-source foundation. The Apache Software Foundation then elected a board of directors to manage the project, create official releases and determine how new features are added to Cassandra.
Both Apache Cassandra and MySQL serve as open-source databases. However, they serve different purposes. With the increasing amounts of data being produced daily, companies must implement solutions that can handle large quantities of information. MySQL was designed to store structured data, which is why it is very helpful when finding the relations between data points. Databases such as MySQL will eventually become inadequate as an increasing amount of information is produced daily and the demand for structured, relational data becomes less important.
In contrast, Apache Cassandra was created as a decentralized distributed database that can handle big data. Apache Cassandra works by spreading out data across multiple nodes in the network instead of having one centralized node that is bound to cause issues if any one of the nodes goes down. For example, Facebook can use Apache Cassandra for its messaging system because it handles many different profiles and can spread the data out. Hence, no user has much information stored on their nodes.
You are a developer and have been asked to install and configure Apache Cassandra on your local machine for development and testing purposes. You are not an expert in installing Apache Cassandra. There is no one nearby who can help.
Don’t worry! This is a problem that many people face, and it happened to us before. So we are writing this article to help you. This article will cover how to install Cassandra on a Debian 11 machine. We hope that after reading this article, you will be able to install Cassandra on Debian 11 with no problems successfully.
Prerequisites
To install Apache Cassandra on Debian 11, you will need to have:
- A server running Debian 11 with a minimum of 2GB of RAM.
- A non-root user with sudo privileges. You can get one by doing the following steps.
Updating Debian
The first thing that we do is update the system by entering the following commands. These commands will take a while because they are downloading many files from the server and installing them onto your machine.
sudo apt update && sudo apt upgrade -y
Installing Java
Java is a programming language and computing platform first released by Sun Microsystems in 1995. It is widely used for developing interactive television programs, applets, web services, games, business software, and mobile applications (including mobile games), and also makes up a large part of the content on the internet today. Java has achieved this status because of its elegant object-oriented architecture that ensures quick development; portability; security; scalability; stability; the speed allows programmers to avoid some of the tedium found in other languages.
Apache Cassandra requires Java to work correctly. The Cassandra java driver has to be used when accessing the database. The java driver is what allows communication between your software and Apache Cassandra.
We will install openjdk version 11 on in this article. Openjdk is the official reference implementation of the Java Platform, Standard Edition (Java SE). It is a modular runtime that can be used to run applications and embedded in devices. Openjdk is available for Linux, Microsoft Windows and Mac OS.
Run the following command to install openjdk 11 on your system.
sudo apt install default-jdk -y
Once the installation finishes, you can check the version of Java by typing in the following command t
java -version
It should output something like this.
Installing Apache Cassandra
Now that you have Java installed correctly, you can install Apache Cassandra.
The Debian 11 repository does not have the Cassandra package, so we need to add the official Apache repository from its developers to our sources.list, and install Apache Cassandra from there.
Use the following command to add the GPG key to your APT package manager.
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
Run the following command to add the official Apache repository to your system.
- echo “deb https://downloads.apache.org/cassandra/debian 40x main” will add the Apache Cassandra’s official Debian repository to your system.
- sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list will add Apache Cassandra’s official Debian repository into one of your sources.list file, in this case, will be put into the /etc/apt/sources.list.d/cassandra.sources.list file, which will make the Apache Cassandra’s official Debian repository available for installation.
- The | pipe will allow the output of one command to be piped as input to another command. In this case, the | (pipe) will be used to direct the output of the curl command to be used as input for tee which will append (tee -a) to the contents of /etc/apt/sources.list.d/cassandra.sources.list to make that repo available for install via APT.
echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
Rerun the apt update command to update your APT package manager, so that the package manager knows your newly-added repository source.
sudo apt update -y
Run the sudo apt-cache policy command to verify that your package manager knows where to find the packages for Apache Cassandra. This command will list all the packages available for installation (in order of priority). This helps users choose which packages to install and where to get them from.
sudo apt-cache policy
In this example, the output of sudo apt-cache policy command is shown below.
Finally, run the following command to install Apache Cassandra on your system.
sudo apt install -y cassandra
Once you have Apache Cassandra installed, all that is left to do is to check up on your Cassandra service. You can verify this by typing in the following command:
sudo systemctl status cassandra
The command above will output information about the current status of your Storage Node (Cassandra Cluster) and provide a summary of how it was installed and configured. In this case, we got a response similar to the one below. This may look scary at first, but it just means that starting up your Cassandra service was successful.
Testing Your Apache Cassandra Cluster
Now that Apache Cassandra is installed and running, we can start checking to see if everything is working correctly.
Run the following command to check your Cassandra cluster.
sudo nodetool status
The sudo nodetool status command is used to check the status of the Cassandra process on the local machine.
sudo nodetool status provides an overall picture of what’s going on in a Cassandra cluster, and its purpose is to gather statistics from all the nodes in a cluster. It also provides performance information about each node in the cluster and data about compaction and repair in progress and Cassandra’s progress in dealing with any failures it has detected.
The following is the output of sudo nodetool status on a machine with Cassandra installed:
- Datacenter : datacenter1 is the name of the datacenter that this node belongs to.
- Node : 127.0.0.1 is the local IP address of this machine.
- Load : the current load on the node. A higher number means that this machine is more loaded than others.
- Host ID : The unique id for this node, which is set by Cassandra and depends on the order of startup.
- Rack: rack1 is the name of the rack that this node belongs to.
- UN: The state of the node. The full word are: UP, DOWN, LEAVING and JOINING. A ‘U’ in the State field indicates that this node is up and running. If the node is DOWN, it does not affect the service as Cassandra can still function on other nodes. A ‘N’ in the State field indicates that the node is normal. This means that the node is running normally and serving traffic.
Run the following command to connect to the Cassandra cluster. The cqlsh command is used to interact with Cassandra using the CQL shell. cqlsh provides a friendly interface to interact with Cassandra. The command offers, among other things, syntax highlighting, tab completion for column names and special keys, an auto-completion mode, and a line-editing mode.
cqlsh
You will get the following output.
Conclusion
This guide taught us how to install and test Apache Cassandra on a Debian 11 machine. We have learned how to interact with the Cassandra cluster from the command line and even managed to check up on our cluster.
At this point, you should know how to interact with your newly-installed Cassandra cluster and should be ready for your Cluster Development. I
If you found this Apache Cassandra article helpful but want to learn more about Cassandra and its features, check out the official Apache Cassandra Documentation. We encourage you also to check out the official Apache Cassandra documentation for your operating system if you want to learn more about Apache Cassandra’s features.