kafka – use cases

1.2 Use Cases

Here is a description of a few of the popular use cases for Apache Kafka™. For an overview of a number of these areas in action, see this blog post.

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.

Website Activity Tracking

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

Activity tracking is often very high volume as many activity messages are generated for each user page view.

Metrics

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Log Aggregation

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.

Stream Processing

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an “articles” topic; further processing might normalize or deduplicate this content and published the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called Kafka Streams is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza.

Event Sourcing

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka’s support for very large stored log data makes it an excellent backend for an application built in this style.

Commit Log

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeper project.

cassandra – write consistency

November 18, 2016November 18, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized cassandra - write consistency

The consistency level determines the number of replicas on which the write must succeed before returning an acknowledgment to the client application.

Write Consistency Levels
Level	Description	Usage
`ALL`	A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition key.	Provides the highest consistency and the lowest availability of any other level.
`EACH_QUORUM`	Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in all data center.	Used in multiple data center clusters to strictly maintain consistency at the same level in each data center. For example, choose this level if you want a read to fail when a data center is down and the `QUORUM` cannot be reached on that data center.
`QUORUM`	A write must be written to the commit log and memtable on a quorum of replica nodes.	Provides strong consistency if you can tolerate some level of failure.
`LOCAL_QUORUM`	Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in the same data center as thecoordinator node. Avoids latency of inter-data center communication.	Used in multiple data center clusters with a rack-aware replica placement strategy, such as NetworkTopologyStrategy, and a properly configured snitch. Use to maintain consistency locally (within the single data center). Can be used with SimpleStrategy.
`ONE`	A write must be written to the commit log and memtable of at least one replica node.	Satisfies the needs of most users because consistency requirements are not stringent.
`TWO`	A write must be written to the commit log and memtable of at least two replica nodes.	Similar to `ONE`.
`THREE`	A write must be written to the commit log and memtable of at least three replica nodes.	Similar to `TWO`.
`LOCAL_ONE`	A write must be sent to, and successfully acknowledged by, at least one replica node in the local data center.	In a multiple data center clusters, a consistency level of `ONE` is often desirable, but cross-DC traffic is not. `LOCAL_ONE` accomplishes this. For security and quality reasons, you can use this consistency level in an offline datacenter to prevent automatic connection to online nodes in other data centers if an offline node goes down.
`ANY`	A write must be written to at least one node. If all replica nodes for the given partition key are down, the write can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an `ANY` write is not readable until the replica nodes for that partition have recovered.	Provides low latency and a guarantee that a write never fails. Delivers the lowest consistency and highest availability.
`SERIAL`	Achieves linearizable consistency for lightweight transactions by preventing unconditional updates.	You cannot configure this level as a normal consistency level, configured at the driver level using the consistency level field. You configure this level using the serial consistency field as part of the native protocol operation. See failure scenarios.
`LOCAL_SERIAL`	Same as SERIAL but confined to the data center. A write must be written conditionally to the commit log and memtable on a quorum of replica nodes in the same data center.	Same as SERIAL. Used for disaster recovery. See failure scenarios.

Even at low consistency levels, the write is still sent to all replicas for the written key, even replicas in other data centers. The consistency level just determines how many replicas are required to respond that they received the write.

cassandra – writes

November 18, 2016November 18, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized cassandra writes

About writes

To manage and access data in Cassandra, it is important to understand how Casssandra writes and reads data, the hinted handoff feature, areas of conformance and non-conformance to the ACID (atomic, consistent, isolated, durable) database properties. In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replicas.

Cassandra includes client utilities and application programming interfaces (APIs) for developing applications for data storage and retrieval.

The role of replication
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance.
How Cassandra stores data
A brief description and illustration of how Cassandra stores and distributes data.
Compaction
During compaction, Cassandra combines multiple data files to improve the performance of partition scans and to reclaim space from deleted data.
About index updates
A brief description about index updates.

scala – List – apply, indices

November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

One reason why random element selection is less popular for lists than for arrays is that xs(n) takes time proportional to the index n. In fact, apply is simply deﬁned by a combination of drop and head:

xs apply n equals (xs drop n).head

The indices method returns a list consisting of all valid indices of a given list:

object Lists_Random extends App{


  val lst=List(1,2,3,4)

println(lst.apply(1))
  println(lst.indices)

}


2
Range(0, 1, 2, 3)

scala.collection.immutable.Range

scala – List – take, drop, splitAt

November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

The expression “xs take n” returns the ﬁrst n elements of the list xs.

If n is greater than xs.length, the whole list xs is returned.

The operation “xs drop n” returns all elements of the list xs except the ﬁrst n ones.

If n is greater than xs.length, the empty list is returned.

The splitAt operationsplitsthelistatagivenindex, returningapairof two lists.4 It is deﬁned by the equality:
xs splitAt n equals (xs take n, xs drop n)

object List_TakeDropSplitAt extends App{

  val v1=List(1,2,3)
  println(v1.take(1))
  println(v1 take(10))

  println(v1.drop(1))

  println(v1.splitAt(1))

}

List(1)
List(1, 2, 3)
List(2, 3)
(List(1),List(2, 3))

Scala – List – use head,tail – Last init takes time

November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

Unlike head and tail,which both run in constant time, init and last need to traverse the whole list to compute their result. They therefore take time proportional to the length of the list.

scala – List – first order methods

November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

A method is ﬁrst-order if it does not take any functions as arguments

Scala – List patterns

November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

List(…) is an instance of a library-deﬁned extractor pattern.

The “cons” pattern x :: xs is a special case of an inﬁx operation pattern. You know already that, when seen as an expression, an inﬁx operation is equivalent to a method call.

When seen as a pattern, an inﬁx operation such as p op q is equivalent to op(p, q). That is, the inﬁx operator op is treated as a constructor pattern. In particular, a cons pattern such as x :: xs is treated as ::(x, xs).

There is a class named :: that corresponds to the pattern constructor. It is named scala.:: and is exactly the class that builds nonempty lists.

So :: exists twice in Scala, once as a name of a class in package scala, and again as a method in class List. The effect of the method :: is to produce an instance of the class scala.::.

Scala – List pattern a::b::rest

November 5, 2016November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

pattern a :: b :: rest matches lists of length 2 or greater

object List_Pattern_rest extends App {

  val lst=List("america","india","china")

  val a :: b :: rest= lst
  //matches lists of length 2 or greater

//  a: String = america 
  // b: String = india 
  // rest: List[String] = List(china)

}

Scala – List patterns

November 5, 2016 corerootz - Ravi Kiran Krovvidi Uncategorized

List patterns correspond one-by-one to list expressions.

match on all elements of a list -using pattern List(…)

or take lists apart bit by bit – using patterns composed from the :: operator and the Nil constant.

Start-Up Ideas, Tech Code, Use Cases, Thoughts

Uncategorized

kafka – use cases

1.2 Use Cases

Messaging

Website Activity Tracking

Metrics

Log Aggregation

Stream Processing

Event Sourcing

Commit Log

cassandra – write consistency

The consistency level determines the number of replicas on which the write must succeed before returning an acknowledgment to the client application.

cassandra – writes

About writes

scala – List – apply, indices

scala – List – take, drop, splitAt

Scala – List – use head,tail – Last init takes time

scala – List – first order methods

Scala – List patterns

Scala – List pattern a::b::rest

Scala – List patterns