rseetham opened a new issue, #12521: URL: https://github.com/apache/pinot/issues/12521
[StreamMessageDecoder's](https://github.com/apache/pinot/blob/ac13a191b945a80084f0a2794391e4be2f463252/pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMessageDecoder.java#L49) init is `void init(Map<String, String> props, Set<String> fieldsToRead, String topicName)` It would be great if the decoder has access to the pinot schema as well. At Uber, we have our own decoder internally to decode avro messages. We use the AvroRecordExtractor at the end but we need access to the pinot schema to do some custom things. Initially, this class has access to the pinot schema but that was [removed in 2020](https://github.com/apache/pinot/pull/5309). This was done because > RecordReader and StreamMessageDecoder is the entry point for batch and streaming data ingestion. They are expected to be implemented and plugged to provide customized format support. To make the abstraction more crispy and easier to understand, remove the Schema and replace it with fields to read so that users do not need to worry about extracting fields from the Pinot schema when adding a new format. fieldsToRead is generated [here](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java#L1477) using `Set<String> fieldsToRead = IngestionUtils.getFieldsForRecordExtractor(_tableConfig.getIngestionConfig(), _schema);` In the [implmentation](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/IngestionUtils.java#L310), if SchemaConformingTransformerConfig is present, we will return empty fieldsToRead. If the fieldsToRead is empty, other parts of the decoder code, assume that we have to extract all the fields in the input schema anyway. [Example](https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java#L52). The request here is to add schema to the initializer of StreamMessageDecoder. It would be great if the StreamMessageDecoder had access to the schema. The fieldsToRead will still be there and used for existing reasons but the schema is a nice to have in the decoder. (In our case, we want to know what the time column). Even in general, if the decoder wants to do specific stuff based on the pinot schema it would be nice to have access to the schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org