Cassandra Installation on Ubuntu
Apache Cassandra is a highly scalable, fault-tolerant NoSQL database designed for handling large-scale data with high availability and zero downtime.
It excels in environments where speed and horizontal scalability are key, such as big data and real-time analytics.
Running Cassandra on Ubuntu provides a stable, efficient platform with easy maintenance, making it ideal for scaling databases without performance loss.
Prerequisites to Cassandra Installation on Ubuntu
Before you jump into the Cassandra installation guide, make sure your system is meeting below specifications:
- A Linux VPS running Ubuntu.
- A non-root user with
sudo
privileges. - Access to Terminal/Command line.
- Java OpenJDK 8 for running Cassandra and accessing repositories securely.
Installing Apache Cassandra on Ubuntu: A Scalable NoSQL Database
Ubuntu’s lightweight and secure nature complements Cassandra’s peer-to-peer architecture, ensuring seamless management of growing data.
With a secured Linux VPS, let’s go through the step-by-step process for a successful Cassandra setup on Ubuntu, ensuring you get the most out of its powerful features.
Step 1: Install Required Packages
Before installing Cassandra, it is essential to ensure your system has Java OpenJDK 8 and the apt-transport-https
package.
To install Java OpenJDK 8, run the command below to update your package repository:
sudo apt update
Then, run the command below to install OpenJDK 8:
sudo apt install openjdk-8-jdk -y
Once the installation is done, verify that Java is installed:
java -version
You should see an output confirming that version 8 of Java is installed. This is crucial because Cassandra specifically requires this version of Java to function properly.
Finally, to access Cassandra’s repositories over HTTPS, the apt-transport-https
package is needed. If it’s not installed, run the following command:
sudo apt install apt-transport-https
This ensures your system is ready to access the Cassandra repositories securely.
Step 2: Add Cassandra Repository and Import GPG Key
In this step, you need to add the Apache Cassandra repository to the system and import its GPG key to ensure the packages are trusted.
To add the repository to your system’s sources list, run the following command:
sudo sh -c 'echo "deb http://www.apache.org/dist/cassandra/debian 40x main" > /etc/apt/sources.list.d/cassandra.list'
This adds the repository for Cassandra version 4.0. If you want an older version, like 3.9, you can replace 40x
with 39x
in the command.
To import the GPG Key, use the command below to download and add the GPG key for the Cassandra repository:
wget -q -O - https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
The OK
message should appear if the key is successfully added.
Step 3: Install Apache Cassandra
Now that the repository is in place, you can install Cassandra.
First, run the command below to update your package list to include the newly added Cassandra repository:
sudo apt update
With the repository updated, run the following command to install Cassandra:
sudo apt install cassandra -y
Once installation is complete, Cassandra will automatically start, and a dedicated Cassandra user is created to run the service.
Step 4: Verify Apache Cassandra Installation on Ubuntu
After installation, it’s important to verify that Cassandra is running correctly. You can use the nodetool status
command to see the status of the Cassandra cluster:
nodetool status
The output should show UN
, which means the cluster is up and running.
Note: To keep track of your cluster’s health and performance, nodetool
provides commands for checking node status, cleaning up data, and more. Integrating with monitoring tools like Prometheus or Grafana can help visualize Cassandra’s performance in real-time.
Alternatively, you can check Cassandra’s service status by running the command below:
sudo systemctl status cassandra
If everything is set up correctly, the status should display as active
(running
).
Step 5: Manage Cassandra Service
At this point, you can use the below commands to manually start, stop, or restart Cassandra at some point.
Start Cassandra:
sudo systemctl start cassandra
Restart Cassandra:
sudo systemctl restart cassandra
Stop Cassandra:
sudo systemctl stop cassandra
Then, use the following command to ensure Cassandra starts automatically when your system boots up:
sudo systemctl enable cassandra
This is an optional step.
Step 6: Configure Apache Cassandra
By default, Cassandra’s configuration is optimized for single-node operation. If you’re setting up a cluster, you’ll need to modify some settings in the cassandra.yaml
file.
Before making any changes, create a backup of the cassandra.yaml
file:
sudo cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.backup
To configure Cassandra, open the configuration file in a text editor:
sudo nano /etc/cassandra/cassandra.yaml
To Edit the Configuration:
- Change the Cluster Name:
Find the cluster_name
field and change it from the default Test Cluster
to your preferred name.
- Add Node IP Addresses (for clusters):
In the seed_provider
section, add the IP addresses of the other nodes in your cluster, separated by commas.
Once done, save and close the file.
Step 7: Test Cassandra Command-Line Shell
Cassandra comes with a built-in command-line interface, cqlsh
, which allows you to run Cassandra Query Language (CQL) commands.
To start the shell, simply type:
cqlsh
This will connect you to the Cassandra instance, and you can start interacting with your database.
Note: CQL is similar to SQL but optimized for Cassandra’s distributed architecture.
You’re All Done! Cassandra is widely used in applications requiring high-speed, scalable data processing, such as IoT systems, social media platforms, and recommendation engines for e-commerce.
With your system prepared, the next step is exploring Cassandra Query Language (CQL) and building powerful, high-availability applications.
How does Cassandra Work on Linux Ubuntu?
Apache Cassandra operates on Ubuntu by leveraging a distributed, peer-to-peer architecture that allows it to handle large volumes of data across multiple nodes seamlessly. Each node in the Cassandra cluster can accept read and write requests, ensuring high availability and resilience against failures. This design allows data to be automatically replicated across various nodes, providing fault tolerance and preventing data loss.
Ubuntu’s lightweight and secure environment complements Cassandra’s requirements, making deployment and maintenance straightforward. Additionally, Cassandra’s architecture is optimized for horizontal scalability, enabling users to easily add more nodes to accommodate growing data needs without sacrificing performance, making it a robust choice for modern data-driven applications.
How to troubleshoot ”No hosts are reachable” error in cqlsh?
This error often indicates a network issue.
To solve it, check the cassandra.yaml
configuration file to ensure the correct IP address and port settings are specified, particularly under the listen_address
and rpc_address
fields.
Why is Cassandra not starting after installation?
If Cassandra does not start, check the logs located at /var/log/cassandra/system.log
for error messages.
Common issues include insufficient memory or Java not being installed correctly.
Conclusion
In this guide, we covered the complete process of installing and configuring Apache Cassandra on Ubuntu, from setting up prerequisites to verifying and managing the service.
Following these steps ensures that your database is up and running efficiently, with the flexibility to scale as your data grows.
Once Cassandra is installed, you can begin optimizing its configuration for your specific use case, whether it’s for a single-node setup or a distributed cluster.
To secure your Cassandra deployment, consider enabling role-based authentication, encrypting communication between nodes, and setting up SSL for added security.