navina opened a new pull request, #9224:
URL: https://github.com/apache/pinot/pull/9224

   This is an extension of PR #9096 
   
   Most stream systems provide a message envelope, which encapsulates the 
record payload, along with record headers, keys and other system-specific 
metadata  For e.g:
   1. Kafka allows keyed records and additionally, provides headers 
   2. Kinesis requires keyed records and includes some additional metadata such 
as sequenceId etc 
   3. Pulsar also supports keyed records and allows including arbitrary 
properties. 
   4. Pubsub supports keyed messages, along with user-defined attributes and 
message metadata. 
   
   Today, Pinot drops everything from the payload, other than the record value 
itself. Hence, there needs to be a way to extract these values and present them 
in the Pinot table as regular columns (of course, it has to be defined in the 
pinot schema). 
   
   This PR attempts to extract key, header and other metadata from any 
supported streaming connector. This can be very useful for the Pinot user as 
they don't have to "pre-process" the stream to make the record metadata 
available in the data payload. It also prevents custom solutions (such as 
[this](https://github.com/startreedata/startree-pinot/pull/484/files)).
   
   For Reviewers, please note:
   1. In the current patch, the record key (when available) is extracted as 
`__key` column , where as headers are extracted as `header$<HEADER_KEY_NAME>` . 
Does this sound like a good convention to follow for all stream connectors  -> 
Header columns will always be prefixed with `header$` and any other metadata 
such as key or offset will be prefixed as `__`
   2. I am in the process of adding some unit tests. I have tested with a pinot 
realtime quickstart. Need to do some more cleanup. 
   3. In `MessageBatch`, I have marked one of the methods as `@Deprecated` as I 
am hoping to eventually eliminate the need for typed interface there. The 
current changes are backwards compatible. Let me know if there is a better way. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to