lnbest0707-uber opened a new pull request, #17031:
URL: https://github.com/apache/pinot/pull/17031

   `feature` `ingestion`
    
   Issue: https://github.com/apache/pinot/issues/16643
   
   Add the initial version of Arrow decoder to Pinot. With the decoder, Pinot 
can decode the stream data in the basic Apache arrow format. This is part of 
the 1st stage delivery of the proposal above.
   
   Performance and Improvements:
   
   - With around 200 messages a patch, the Kafka data volume could reduce 
20-30%.
   - If the data could benefit from Arrow dictionary encoding, the data volume 
could reduce up to 70-80%.
   
   Some limitation and TODOs:
   
   - Arrow is a columnar data format, to represent rows of data, it is 
efficient to batch them together. Hence, sending single message in Arrow format 
would be very inefficient. And batching means slightly higher latency and 
larger message size (average per message size would be smaller, but the sent 
message might be bigger).
   - The 1st version would handle common data structure with dictionary support 
on flat key/values. It does not support the nested dictionary encoding so far.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to