wypoon commented on PR #12260: URL: https://github.com/apache/iceberg/pull/12260#issuecomment-2888536801
Hi @huaxingao, thank you for reviewing this! You are correct that the tests would pass with the original implementation, except for `testReadStreamWithCompositeReadLimit`, which would fail due to the bug in `SparkMicroBatchStream::getDefaultReadLimit()`. As I understand it, in `SparkMicroBatchStream`, we implement `getDefaultReadLimit()` (using what is set by configuration options), and Spark calls `latestOffset(Offset, ReadLimit)` with that `ReadLimit`. In principle, Spark can call `latestOffset(Offset, ReadLimit)` with any `ReadLimit` and `SparkMicroBatchStream` should respond according to that `ReadLimit`, but if Spark calls `latestOffset(Offset, ReadLimit)` with the `ReadLimit` given by `getDefaultReadLimit()`, then there is no difference in behavior between the original implementation (where we ignore the `ReadLimit` passed in and just use the `ReadLimit` corresponding to the configuration options) and the one in this PR. However, technically, we should not assume that and instead use the `ReadLimit` passed in. Do you know if and how Spark would pass in a different `ReadLimit` than what is returned by `getDefaultReadLimit()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org