zhuqi-lucas commented on PR #21828: URL: https://github.com/apache/datafusion/pull/21828#issuecomment-4342086696
Thanks @alamb for the thoughtful feedback! I agree with your suggestion to simplify. **Plan: start with Case 1 only (no filter, single file)** This PR already works for Case 1 — when there's no predicate, `with_offset` accepts, `GlobalLimitExec` is eliminated, and parquet handles all offset skipping via RG prune + RowSelection. The semantics are unchanged from current behavior (physical row order, same as today's `GlobalLimitExec` skip). I'll simplify by: 1. Remove `offset_fully_handled` — `with_offset` returns `Some` only when no filter (already the case after latest changes) 2. Remove the Case 2 (filter + fully matched) complexity — tracked as follow-up 3. Move RowSelection logic into a method on `PreparedAccessPlan` 4. Update docs/naming per your suggestions 5. Mark as API change Re: the semantics discussion — this PR doesn't change any OFFSET semantics. It produces the exact same rows as the current `GlobalLimitExec` skip, just faster. The semantic questions (ORDER BY requirement, multi-file ordering) are orthogonal and can be discussed separately. Will push the simplified version shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
