Data streaming is the process of continuously sending data from a source to a destination in real-time, like a never-ending conveyor belt of information. It's the opposite of batch processing, which is like your grandma making a huge batch of cookies all at once and then doling them out to you and your 27 cousins over the next week.
I was trying to build a real-time dashboard for our CEO's vanity metrics, but the data streaming pipeline kept crashing because someone forgot to add more EC2 instances again.
My startup is pivoting to data streaming because some Forbes article said it's the next big thing, even though none of us really know what it means.
Stream processing, Event sourcing, Reactive, CEP… and making sense of it all - This article breaks down the reasons for using event streams in your system design, like loose coupling, fast reads/writes, scalability, and simplified error handling. It also covers some popular tools like Kafka and Samza.
Real-time full-text search with Luwak and Samza - Want to build something like Twitter's real-time search or Google Alerts? This post dives into the challenges of full-text search on data streams and how tools like Luwak and Samza can help.
Designing Data-Intensive Applications - If you really want to geek out on data streaming and other data engineering concepts, check out this O'Reilly book by Martin Kleppmann. It's a beast, but you'll be the smartest person in the daily stand-up afterwards.
Note: the Developer Dictionary is in Beta. Please direct feedback to skye@statsig.com.