[GitHub] [pinot] navina commented on a diff in pull request #9511: Handle exception in realtime decoder gracefully

GitBox Wed, 05 Oct 2022 13:07:30 -0700


navina commented on code in PR #9511:
URL: https://github.com/apache/pinot/pull/9511#discussion_r985365055



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -543,23 +543,32 @@ private boolean processStreamEvents(MessageBatch 
messagesAndOffsets, long idlePi
       // Decode message
       StreamDataDecoderResult decodedRow = 
_streamDataDecoder.decode(messagesAndOffsets.getStreamMessage(index));
       RowMetadata msgMetadata = 
messagesAndOffsets.getStreamMessage(index).getMetadata();
+      GenericRow decoderResult;
       if (decodedRow.getException() != null) {

Review Comment:
   qq: Currently, stream data decoder decodes key, value, headers and metadata 
(See `StreamDataDecoderImpl`). Should Pinot treat failure to decode each of 
these parts similarly? 
   Previously, Pinot's decode failure would only pertain to the value decoding 
errors. If the user is not interested in the header/metadata/key fields, I 
wonder if there will be a case where this partially decoded result is still 
needed and it should not be considered as a failure. What are your thoughts on 
that?
   
   I think it might be cleaner to wrap this `continueOnError` logic within 
`StreamDataDecoderImpl` ? Or extend that class and handle? 
   
   
   
    



##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ingestion/IngestionConfig.java:
##########
@@ -51,6 +51,10 @@ public class IngestionConfig extends BaseJsonConfig {
   @JsonPropertyDescription("Configs related to skip any row which has error 
and continue during ingestion")
   private boolean _continueOnError;
 
+  @JsonPropertyDescription("If set to true, the records with 
GenericRow.INCOMPLETE_RECORD_KEY will not be consumed."
+      + "This can be helpful if user only wants to see correct data in the 
table")

Review Comment:
   Correct me if I am mistaken: we are not really storing the partially decoded 
record. When decode failure happens, we only store an empty `GenericRow` with 
the field `INCOMPLETE_RECORD_KEY`. So, calling this `skipPartialRecords` is 
confusing. Something like `DECODE_FAILED_KEY` seems more appropriate. 
   
   Can you clarify the description -  "If set to true, the records with 
GenericRow.INCOMPLETE_RECORD_KEY will not be **consumed**." -> consumed for 
query ? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] navina commented on a diff in pull request #9511: Handle exception in realtime decoder gracefully

Reply via email to