ksingh7 commented on issue #251: URL: https://github.com/apache/camel-kafka-connector/issues/251#issuecomment-643627558
@oscerd First of all thanks for building this connector, I am happy that it works \o/ I like to share a use case of Kafka Messages ===Batching===> S3 (inspired by RHT Open Hybrid Edge Computing project ) Suppose we wanted to ingest live sensor data from a fleet of thousands of IoT devices (ex : connected smart cars, industrial sensors etc) into Kafka. We wanted to persist data for long term processing and big data analytics, so data should be moved to Object Storage (s3). Given, this if we have, say 10K devices, each device is sending 1 message every 10 seconds (this is for simple calculation, but in reality, the rate is much higher than this). So in 24 hours, this will generate 10000 x 8640 = 86400000 Messages / Day So if these 84Million messages, occupy 1 Object per message, we will end up having 84Million objects in s3 each day, which is overwhelming. Ideally, if we can batch this something like, messages generated in 10 minutes will get stored in 1 S3 Object (file). This can drastically reduce the number of objects in S3 bucket (from 84M to 144K objects) 10000 x 8640 = 86400000 Messages / 600 = 144K S3 objects Answering your previous comment point >> Also batch operation in what sense? Writing multiple lines on the same file? Or write multiple files in one shot? With batch, we mean, writing multiples lines (messages) on a same file BTW this is a feature request not a but :) happy to discuss more on this topic ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org