npawar commented on a change in pull request #8309: URL: https://github.com/apache/pinot/pull/8309#discussion_r821287802
########## File path: pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/KafkaPartitionLevelConnectionHandler.java ########## @@ -56,6 +57,7 @@ public KafkaPartitionLevelConnectionHandler(String clientId, StreamConfig stream consumerProp.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, _config.getBootstrapHosts()); consumerProp.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); consumerProp.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, BytesDeserializer.class.getName()); + consumerProp.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, OffsetResetStrategy.EARLIEST.toString().toLowerCase()); Review comment: Actually, we do know what caused OOR. Server was down for a while, and when it came back up, the offsets had expired. This would be the most common case (along with any pause we add in the future). Such a user would always want the consumption to resume with minimal data loss. I would treat this as a bug, instead of a configurable behavior. By default we reset to latest, and cause a lot more data loss even though rows are present. With this, we forward to the next point from where we can consume. The behavior is similar to ValidationManager -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org