zhuqi-lucas commented on PR #21828:
URL: https://github.com/apache/datafusion/pull/21828#issuecomment-4342086696

   Thanks @alamb for the thoughtful feedback! I agree with your suggestion to 
simplify.
   
   **Plan: start with Case 1 only (no filter, single file)**
   
   This PR already works for Case 1 — when there's no predicate, `with_offset` 
accepts, `GlobalLimitExec` is eliminated, and parquet handles all offset 
skipping via RG prune + RowSelection. The semantics are unchanged from current 
behavior (physical row order, same as today's `GlobalLimitExec` skip).
   
   I'll simplify by:
   1. Remove `offset_fully_handled` — `with_offset` returns `Some` only when no 
filter (already the case after latest changes)
   2. Remove the Case 2 (filter + fully matched) complexity — tracked as 
follow-up
   3. Move RowSelection logic into a method on `PreparedAccessPlan` 
   4. Update docs/naming per your suggestions
   5. Mark as API change
   
   Re: the semantics discussion — this PR doesn't change any OFFSET semantics. 
It produces the exact same rows as the current `GlobalLimitExec` skip, just 
faster. The semantic questions (ORDER BY requirement, multi-file ordering) are 
orthogonal and can be discussed separately.
   
   Will push the simplified version shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to