navina commented on code in PR #9511: URL: https://github.com/apache/pinot/pull/9511#discussion_r985365055
########## pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java: ########## @@ -543,23 +543,32 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi // Decode message StreamDataDecoderResult decodedRow = _streamDataDecoder.decode(messagesAndOffsets.getStreamMessage(index)); RowMetadata msgMetadata = messagesAndOffsets.getStreamMessage(index).getMetadata(); + GenericRow decoderResult; if (decodedRow.getException() != null) { Review Comment: qq: Currently, stream data decoder decodes key, value, headers and metadata (See `StreamDataDecoderImpl`). Should Pinot treat failure to decode each of these parts similarly? Previously, Pinot's decode failure would only pertain to the value decoding errors. If the user is not interested in the header/metadata/key fields, I wonder if there will be a case where this partially decoded result is still needed and it should not be considered as a failure. What are your thoughts on that? I think it might be cleaner to wrap this `continueOnError` logic within `StreamDataDecoderImpl` ? Or extend that class and handle? ########## pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ingestion/IngestionConfig.java: ########## @@ -51,6 +51,10 @@ public class IngestionConfig extends BaseJsonConfig { @JsonPropertyDescription("Configs related to skip any row which has error and continue during ingestion") private boolean _continueOnError; + @JsonPropertyDescription("If set to true, the records with GenericRow.INCOMPLETE_RECORD_KEY will not be consumed." + + "This can be helpful if user only wants to see correct data in the table") Review Comment: Correct me if I am mistaken: we are not really storing the partially decoded record. When decode failure happens, we only store an empty `GenericRow` with the field `INCOMPLETE_RECORD_KEY`. So, calling this `skipPartialRecords` is confusing. Something like `DECODE_FAILED_KEY` seems more appropriate. Can you clarify the description - "If set to true, the records with GenericRow.INCOMPLETE_RECORD_KEY will not be **consumed**." -> consumed for query ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org