lnbest0707-uber opened a new pull request, #17031:
URL: https://github.com/apache/pinot/pull/17031
`feature` `ingestion`
Issue: https://github.com/apache/pinot/issues/16643
Add the initial version of Arrow decoder to Pinot. With the decoder, Pinot
can decode the stream data in the basic Apache arrow format. This is part of
the 1st stage delivery of the proposal above.
Performance and Improvements:
- With around 200 messages a patch, the Kafka data volume could reduce
20-30%.
- If the data could benefit from Arrow dictionary encoding, the data volume
could reduce up to 70-80%.
Some limitation and TODOs:
- Arrow is a columnar data format, to represent rows of data, it is
efficient to batch them together. Hence, sending single message in Arrow format
would be very inefficient. And batching means slightly higher latency and
larger message size (average per message size would be smaller, but the sent
message might be bigger).
- The 1st version would handle common data structure with dictionary support
on flat key/values. It does not support the nested dictionary encoding so far.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]