navina opened a new pull request, #9224: URL: https://github.com/apache/pinot/pull/9224
This is an extension of PR #9096 Most stream systems provide a message envelope, which encapsulates the record payload, along with record headers, keys and other system-specific metadata For e.g: 1. Kafka allows keyed records and additionally, provides headers 2. Kinesis requires keyed records and includes some additional metadata such as sequenceId etc 3. Pulsar also supports keyed records and allows including arbitrary properties. 4. Pubsub supports keyed messages, along with user-defined attributes and message metadata. Today, Pinot drops everything from the payload, other than the record value itself. Hence, there needs to be a way to extract these values and present them in the Pinot table as regular columns (of course, it has to be defined in the pinot schema). This PR attempts to extract key, header and other metadata from any supported streaming connector. This can be very useful for the Pinot user as they don't have to "pre-process" the stream to make the record metadata available in the data payload. It also prevents custom solutions (such as [this](https://github.com/startreedata/startree-pinot/pull/484/files)). For Reviewers, please note: 1. In the current patch, the record key (when available) is extracted as `__key` column , where as headers are extracted as `header$<HEADER_KEY_NAME>` . Does this sound like a good convention to follow for all stream connectors -> Header columns will always be prefixed with `header$` and any other metadata such as key or offset will be prefixed as `__` 2. I am in the process of adding some unit tests. I have tested with a pinot realtime quickstart. Need to do some more cleanup. 3. In `MessageBatch`, I have marked one of the methods as `@Deprecated` as I am hoping to eventually eliminate the need for typed interface there. The current changes are backwards compatible. Let me know if there is a better way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org