Building Data Pipelines

Problem Statement

There are various usecases to consume data from multiple source streams (Kafka, SQS, RMQ) and populating to stores like Redis, DDB, MySql, S3, Http Rest and Kafka with a minimal processing (no heavy business logic). Data copy could be real time streaming or batch. So we need a generic framework…

Domain Driven Design Artifacts

Value Objects

“An object that represents a descriptive aspect of the domain with no conceptual identity is called a Value Object”
Identity: They don’t have their own identity.
Immutability: Value object are immutable, they are treated as snapshots of some states
Lifetime: There is no lifecycle for a value object. This means that they can…

Kafka-S3-DataCopy-ExactlyOnce (?)

Background

All signals required to aggregate comes from multiple different Kafka topics. We run spark jobs to aggregate data and store aggregated data in DynamoDB. Since these metrics are critical, we can not afford data loss or duplication while copying data from Kafka to S3. …

Sunil Kalva

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store