cassandra – starting seed nodes first and then other nodes

 

  • In the cassandra-rackdc.properties file, assign the data center and rack names you determined in the Prerequisites. For example:
    # indicate the rack and dc for this node
    dc=DC1
    rack=RAC1
  • After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start the rest of the nodes.
    Note: If the node has restarted because of automatic restart, you must first stop the node and clear the data directories, as described above.

    Package installations:

    $ sudo service cassandra start

    Tarball installations:

    $ cd install_location 
    $ bin/cassandra
  • To check that the ring is up and running, run:

    Package installations:

    $ nodetool status

    Tarball installations:

    $ cd install_location 
    $ bin/nodetool status

    Each node should be listed and it’s status and state should be UN (Up Normal).

 

 

cassandra – yaml – commonly used properties

commit_failure_policy 
(Default: stop) Policy for commit disk failures:

  • die

    Shut down gossip and Thrift and kill the JVM, so the node can be replaced.

  • stop

    Shut down gossip and Thrift, leaving the node effectively dead, but can be inspected using JMX.

  • stop_commit

    Shut down the commit log, letting writes collect but continuing to service reads (as in pre-2.0.5 Cassandra).

  • ignore

    Ignore fatal errors and let the batches fail.

disk_failure_policy (Default: stop) Sets how Cassandra responds to disk failure. Recommend settings are stop or best_effort.

  • die

    Shut down gossip and Thrift and kill the JVM for any file system errors or single SSTable errors, so the node can be replaced.

  • stop_paranoid

    Shut down gossip and Thrift even for single SSTable errors.

  • stop

    Shut down gossip and Thrift, leaving the node effectively dead, but available for inspection using JMX.

  • best_effort

    Stop using the failed disk and respond to requests based on the remaining available SSTables. This means you will see obsolete data at consistency level of ONE.

  • ignore

    Ignores fatal errors and lets the requests fail; all file system errors are logged but otherwise ignored. Cassandra acts as in versions prior to 1.2.

 

endpoint_snitch 

(Default: org.apache.cassandra.locator.SimpleSnitch) Set to a class that implements the IEndpointSnitch. Cassandra uses snitches for locating nodes and routing requests.

  • GossipingPropertyFileSnitch

    Recommended for production. The rack and datacenter for the local node are defined in the cassandra-rackdc.properties file and propagated to other nodes via gossip. To allow migration from the PropertyFileSnitch, it uses the cassandra-topology.properties file if it is present.

  • PropertyFileSnitch

    Determines proximity by rack and datacenter, which are explicitly configured in the cassandra-topology.properties file.

  • Ec2Snitch

    For EC2 deployments in a single region. Loads region and availability zone information from the EC2 API. The region is treated as the datacenter and the availability zone as the rack. Uses only private IPs. Subsequently it does not work across multiple regions.

 

Snitches

A snitch determines which datacenters and racks nodes belong to. They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into datacenters and racks. Specifically, the replication strategy places the replicas based on the information provided by the new snitch. All nodes must return to the same rack and datacenter. Cassandra does its best not to have more than one replica on the same rack (which is not necessarily a physical location).

Note: If you change snitches, you may need to perform additional steps because the snitch affects where replicas are placed. See Switching snitches.

 

rpc_address 
(Default: localhost) The listen address for client connections (Thrift RPC service and native transport).Valid values are:

  • unset:

    Resolves the address using the hostname configuration of the node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS.

  • 0.0.0.0:

    Listens on all configured interfaces, but you must set the broadcast_rpc_address to a value other than 0.0.0.0.

  • IP address
  • hostname

Related information: Network

rpc_interface 
(Default: eth1)note The listen address for client connections. Interfaces must correspond to a single address, IP aliasing is not supported. See rpc_address.
rpc_interface 
(Default: false) By default, if an interface has an ipv4 and an ipv6 address, the first ipv4 address will be used. If set to true, the first ipv6 address will be used.

 

 

cassandra – yaml – saved caches dir

data_file_directories 
The directory location where table data (SSTables) is stored. Cassandra distributes data evenly across the location, subject to the granularity of the configured compaction strategy. Default locations:

  • Package installations: /var/lib/cassandra/data
  • Tarball installations: install_location/data/data

As a production best practice, use RAID 0 and SSDs.

saved_caches_directory 
The directory location where table key and row caches are stored. Default location:

  • Package installations: /var/lib/cassandra/saved_caches
  • Tarball installations: install_location/data/saved_caches

cassandra – commit log dir

Default directories

If you have changed any of the default directories during installation, make sure you have root access and set these properties:

commitlog_directory 
The directory where the commit log is stored. Default locations:

  • Package installations: /var/lib/cassandra/commitlog
  • Tarball installations: install_location/data/commitlog

For optimal write performance, place the commit log be on a separate disk partition, or (ideally) a separate physical device from the data file directories. Because the commit log is append only, an HDD for is acceptable for this purpose.

cassandra – yaml – common properties – endpoint snitch

 

endpoint_snitch 

(Default: org.apache.cassandra.locator.SimpleSnitch) Set to a class that implements the IEndpointSnitch. Cassandra uses snitches for locating nodes and routing requests.

  • GossipingPropertyFileSnitch

    Recommended for production. The rack and datacenter for the local node are defined in the cassandra-rackdc.properties file and propagated to other nodes via gossip. To allow migration from the PropertyFileSnitch, it uses the cassandra-topology.properties file if it is present.

  • PropertyFileSnitch

    Determines proximity by rack and datacenter, which are explicitly configured in the cassandra-topology.properties file.

 

 

cassandra – yaml – quick start properties – cluster name

cluster_name 
(Default: Test Cluster) The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same value.
listen_address 
(Default: localhost) The IP address or hostname that Cassandra binds to for connecting to other Cassandra nodes. Set this parameter or listen_interface, not both. You must change the default setting for multiple nodes to communicate:

  • Generally set to empty. If the node is properly configured (host name, name resolution, and so on), Cassandra uses InetAddress.getLocalHost() to get the local address from the system.
  • For a single node cluster, you can use the default setting (localhost).
  • If Cassandra can’t find the correct address, you must specify the IP address or host name.
  • Never specify 0.0.0.0; it is always wrong.
listen_interface 
(Default: eth0)note The interface that Cassandra binds to for connecting to other Cassandra nodes. Interfaces must correspond to a single address, IP aliasing is not supported. See listen_address.

cassandra – yaml conf properties groups

  • Quick start

    The minimal properties needed for configuring a cluster.

  • Commonly used

    Properties most frequently used when configuring Cassandra.

  • Performance tuning

    Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.

  • Advanced

    Properties for advanced users or properties that are less commonly used.

  • Security

    Server and client security settings.