Cassandra – data consistency

Configuring data consistency

Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For any given read or write operation, the client application decides how consistent the requested data must be.

Tunable consistency for client requests 

Consistency levels in Cassandra can be configured to manage availability versus data accuracy. A tutorial in the CQL documentation compares consistency levels using cqlsh tracing. You can configure consistency on a cluster, data center, or individual I/O operation basis. Consistency among participating nodes can be set globally and also controlled on a per-operation basis (for example insert or update) using Cassandra’s drivers and client libraries.

cassandra – frozen in a collection

Using frozen in a collection 

A frozen value serializes multiple components into a single value. Non-frozen types allow updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire value must be overwritten.

Note: You cannot use non-frozen collections for primary key columns. However, you can use frozen collections for primary key columns.
column_name <collection_type><cql_type, frozen<column_name>>

For example:

CREATE TABLE mykeyspace.users (
  id uuid PRIMARY KEY,
  name frozen <fullname>,
  direct_reports set<frozen <fullname>>,     // a collection set
  addresses map<text, frozen <address>>     // a collection map
);

cassandra – collection

Collection type

A collection column is declared using the collection type, followed by another type, such as int or text, in angle brackets. For example, you can create a table having a list of textual elements, a list of integers, or a list of some other element types.

list<text>
list<int>

Collection types cannot currently be nested. For example, you cannot define a list within a list:

list<list<int>>     // not allowed

In Cassandra 2.1 and later, you can create an index on a column of type map, set, or list

cassandra – blob – ex. bigintAsBlob, blobAsBigint

Blob type

The Cassandra blob data type represents a constant hexadecimal number defined as 0[xX](hex)+ where hex is an hexadecimal character, such as [0-9a-fA-F]. For example, 0xcafe. The maximum theoretical size for a blob is 2GB. The practical limit on blob size, however, is less than 1 MB, ideally even smaller. A blob type is suitable for storing a small image or short string.

Blob conversion functions 

These functions convert the native types into binary data (blob):

  • typeAsBlob(type)
  • blobAsType

For every native, nonblob type supported by CQL, the typeAsBlob function takes a argument of type type and returns it as a blob. Conversely, the blobAsType function takes a 64-bit blob argument and converts it to a bigint value.

This example shows how to use bitintAsBlob:

CREATE TABLE bios ( user_name varchar PRIMARY KEY, 
   bio blob
 );

INSERT INTO bios (user_name, bio) VALUES ('fred', bigintAsBlob(3));


This example shows how to use blobAsBigInt.

ALTER TABLE bios ADD id bigint;

INSERT INTO bios (user_name, id) VALUES ('fred', blobAsBigint(0x0000000000000003));
 

cassandra – sharing a static column

Sharing a static column

In a table that uses clustering columns, non-clustering columns can be declared static in the table definition. Static columns are only static within a given partition.

CREATE TABLE t (
  k text,
  s text STATIC,
  i int,
  PRIMARY KEY (k, i)
);
INSERT INTO t (k, s, i) VALUES ('k', 'I''m shared', 0);
INSERT INTO t (k, s, i) VALUES ('k', 'I''m still shared', 1);
SELECT * FROM t;

Output is:

 k |                  s | i   
----------------------------
k  | "I'm still shared" | 0 
k  | "I'm still shared" | 1       

Restrictions

  • A table that does not define any clustering columns cannot have a static column. The table having no clustering columns has a one-row partition in which every column is inherently static.
  • A table defined with the COMPACT STORAGE directive cannot have a static column.
  • A column designated to be the partition key cannot be static.

cassandra – TTL for row data

Determining time-to-live for a column

This procedure creates a table, inserts data into two columns, and calls the TTL function to retrieve the date/time of the writes to the columns.

Procedure

  1. Create a users table named clicks in the excelsior keyspace.
    CREATE TABLE excelsior.clicks (
      userid uuid,
      url text,
      date timestamp,  //unrelated to WRITETIME discussed in the next section
      name text,
      PRIMARY KEY (userid, url)
    );
  2. Insert data into the table, including a date in yyyy-mm-dd format, and set that data to expire in a day (86400 seconds). Use the USING TTL clause to set the expiration period.
    INSERT INTO excelsior.clicks (
      userid, url, date, name)
      VALUES (
        3715e600-2eb0-11e2-81c1-0800200c9a66,
        'http://apache.org',
        '2013-10-09', 'Mary')
        USING TTL 86400;
  3. Wait for a while and then issue a SELECT statement to determine how much longer the data entered in step 2 has to live.
    SELECT TTL (name) from excelsior.clicks
      WHERE url = 'http://apache.org' ALLOW FILTERING;
    Output is, for example, 85908 seconds:

     ttl(name)
    -----------
     85908

cassandra – Determining the date/time of a write

Using the WRITETIME function in a SELECT statement returns the date/time in microseconds that the column was written to the database. This procedure continues the example from the previous procedure and calls the WRITETIME function to retrieve the date/time of the writes to the columns.

Procedure

  1. Insert more data into the table.
    INSERT INTO excelsior.clicks (
      userid, url, date, name)
      VALUES (
        cfd66ccc-d857-4e90-b1e5-df98a3d40cd6,
        'http://google.com',
        '2013-10-11', 'Bob'
    );
  2. Retrieve the date/time that the value Mary was written to the name column of the apache.org data. Use the WRITETIME function in a SELECT statement, followed by the name of a column in parentheses:
    SELECT WRITETIME (name) FROM excelsior.clicks
      WHERE url = 'http://apache.org' ALLOW FILTERING;
    The writetime output in microseconds converts to Sun, 14 Jul 2013 21:57:58 GMT or to 2:57 pm Pacific time.

    writetime(name)
    ------------------
    1373839078327001
  3. Retrieve the date/time of the last write to the date column for google.com data.
    SELECT WRITETIME (date) FROM excelsior.clicks
      WHERE url = 'http://google.com' ALLOW FILTERING;
    The writetime output in microseconds converts to Sun, 14 Jul 2013 22:03:15 GMT or 3:03 pm Pacific time.

    writetime(date)
    ------------------
    1373839395324001

cassandra – natural vs surrogate primary key

Using natural or surrogate primary keys

One consideration is whether to use surrogate or natural keys for a table. A surrogate key is a generated key (such as a UUID) that uniquely identifies a row, but has no relation to the actual data in the row.

For some tables, the data may contain values that are guaranteed to be unique and are not typically updated after a row is created. For example, the user name in a users table. This is called a natural key. Natural keys make the data more readable and remove the need for additional indexes or denormalization. However, unless your client application ensures uniqueness, it could potentially overwrite column data.

cassandra – counter type

To use counter types, see the DataStax blog about counters and Using a counter. Do not assign this type to a column that serves as the primary key. Also, do not use the counter type in a table that contains anything other than counter types (and primary key). To generate sequential numbers for surrogate keys, use the timeuuid type instead of the counter type. You cannot create an index on a counter column.

cassandra – table properties

Setting a table property 

Using the optional WITH clause and keyword arguments, you can configure caching, compaction, and a number of other operations that Cassandra performs on new table. You can use the WITH clause to specify the properties of tables listed in CQL table properties. Enclose a string property in single quotation marks. For example:

CREATE TABLE MonkeyTypes (
  block_id uuid,
  species text,
  alias text,
  population varint,
  PRIMARY KEY (block_id)
)
WITH comment='Important biological records'
AND read_repair_chance = 1.0;

CREATE TABLE DogTypes (
  block_id uuid,
  species text,
  alias text,
  population varint,
  PRIMARY KEY (block_id)
) WITH compression =
    { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 }
  AND compaction =
    { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 };