ksingh7 commented on issue #251:
URL: 
https://github.com/apache/camel-kafka-connector/issues/251#issuecomment-643627558


   @oscerd  First of all thanks for building this connector, I am happy that it 
works \o/
   
   I like to share a use case of Kafka Messages ===Batching===> S3  (inspired 
by RHT Open Hybrid Edge Computing project )
   
   Suppose we wanted to ingest live sensor data from a fleet of thousands of 
IoT devices (ex : connected smart cars, industrial sensors etc) into Kafka. We 
wanted to persist data for long term processing and big data analytics, so data 
should be moved to Object Storage (s3).
   
   Given, this if we have, say 10K devices, each device is sending 1 message 
every 10 seconds (this is for simple calculation, but in reality, the rate is 
much higher than this). So in 24 hours, this will generate
   
   10000 x 8640 = 86400000 Messages / Day
   
   So if these 84Million messages, occupy 1 Object per message, we will end up 
having 84Million objects in s3 each day, which is overwhelming. Ideally, if we 
can batch this something like, messages generated in 10 minutes will get stored 
in 1 S3 Object (file). This can drastically reduce the number of objects in S3 
bucket (from 84M to 144K objects)
   
   10000 x 8640 = 86400000 Messages / 600 = 144K S3 objects
   
   Answering your previous comment point
   >> Also batch operation in what sense? Writing multiple lines on the same 
file? Or write multiple files in one shot?
   
   With batch, we mean, writing multiples lines (messages) on a same file 
   
   BTW this is a feature request not a but :) happy to discuss more on this 
topic 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to