Kafka is a distributed streaming platform that lets you publish and subscribe to streams of records. It's like a giant message queue that can handle trillions of events per day, making it perfect for building real-time data pipelines and streaming apps that power everything from fraud detection to ad targeting.
I was going to spend the weekend hiking, but then I remembered I need to finish setting up our new Kafka cluster. Guess I'll be spending my Saturday night partitioning logs instead of enjoying the great outdoors.
Our latest feature is so popular, it's pushing over 100,000 events per second through Kafka. Looks like I'll be pulling another all-nighter to keep this thing from falling over like a house of cards.
Kafka, Samza, and the Unix Philosophy of Distributed Data dives into how Kafka and Samza embody Unix principles to enable building robust, scalable data infrastructure.
Turning the database inside-out with Apache Samza explains how Kafka and stream processing with Samza can be used to implement materialized views and enable new application architectures.
Using logs to build a solid data infrastructure (or: why dual writes are a bad idea) discusses how Kafka's log-based approach provides a solid foundation for data integration and real-time processing.
Note: the Developer Dictionary is in Beta. Please direct feedback to skye@statsig.com.