Sunil Kalva
2 min readJun 21, 2021

--

Kafka-S3-DataCopy-ExactlyOnce (?)

Background

All signals required to aggregate comes from multiple different Kafka topics. We run spark jobs to aggregate data and store aggregated data in DynamoDB. Since these metrics are critical, we can not afford data loss or duplication while copying data from Kafka to S3. We can use Apache-Flume (refer here why flume) based framework to copy data from Kafka/RMQ to S3 (with new adapters) with exactly-once semantics (?).

We also want to copy data based on event time, and event time should be configurable. All late events should be attributed to event time which is configured.

Pseudocode

Approach

Flow

The regular flow starts consuming events from Kafka/RMQ in batches and writes these events to a staged file (using aws efs), While writing the batch to the staged file If one of the following conditions met (depends on configured partitioner) 1. #count, 2. Size, 3. Time, commit the offset and upload the file into S3 (prefix the file with committed offset). If none of the conditions met, it will commit the offset after writing an entire batch to a local file.
On service restart or partition reassignment, scan all the local files and find the latest offset written to disk, and start reading Kafka from that offset.

Each partition writes data in its own folder and own file.

Note:

  1. File data format = Each line is a tuple containing (offset, data).
  2. Staging folder format = /data/staging/<stream-name>/<partition-number>/P-<partitionid>-event-time_processing-time.latestoffset
  3. S3 folder format = /raw/<stream-name>/<event-year>/<event-month>/<event-day>/<event-hour>/files

Failure points and Recovery

  1. A node can crash while writing to a local disk (2-> from the above pseudo-code), the same partition(s) will be assigned to another node since its efs, another node can scan the same location and upload the staged files to S3 and start reading from the last offset (this can ) written to file.
  2. A node can crash after committing offset and before uploading a local file to S3 (4,5-> from the above pseudo-code), another node that is assigned these partitions does the same as above.
  3. A node can crash after uploading the file to S3 (6-> from the above pseudo-code), this is not an issue because the offset is committed before uploading the file.

--

--