npawar commented on a change in pull request #8309:
URL: https://github.com/apache/pinot/pull/8309#discussion_r821287802



##########
File path: 
pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConnectionHandler.java
##########
@@ -56,6 +57,7 @@ public KafkaPartitionLevelConnectionHandler(String clientId, 
StreamConfig stream
     consumerProp.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
_config.getBootstrapHosts());
     consumerProp.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
StringDeserializer.class.getName());
     consumerProp.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
BytesDeserializer.class.getName());
+    consumerProp.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, 
OffsetResetStrategy.EARLIEST.toString().toLowerCase());

Review comment:
       Actually, we do know what caused OOR. Server was down for a while, and 
when it came back up, the offsets had expired. This would be the most common 
case (along with any pause we add in the future). Such a user would always want 
the consumption to resume with minimal data loss.
   I would treat this as a bug, instead of a configurable behavior. By default 
we reset to latest, and cause a lot more data loss even though rows are 
present. With this, we forward to the next point from where we can consume. The 
behavior is similar to ValidationManager 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to