[I] Add server-side record batch filter execution [fluss]

via GitHub Sat, 28 Mar 2026 07:22:43 -0700


platinumhamburg opened a new issue, #2950:
URL: https://github.com/apache/fluss/issues/2950


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Description
   
   Introduce server-side record batch filtering using batch-level statistics 
(min/max values, null counts) that are already available in the V1 log batch 
format. When a client sends a fetch request with a filter predicate, the server 
evaluates the predicate against each batch's statistics and skips batches that 
cannot contain matching records.
   
   Key points:
   
   - **Batch-level filtering, not row-level**: the server uses batch statistics 
to skip entire batches. The client still performs row-level filtering on the 
returned batches.
   - **ARROW format only**: only ARROW log format includes batch-level 
statistics (V1+ magic). COMPACTED/INDEXED formats fall back to unfiltered reads.
   - **Schema evolution safe**: a `PredicateSchemaResolver` adapts the 
predicate when the batch schema differs from the predicate schema, with safe 
fallback (include the batch) on any failure.
   - **Offset advancement**: when all batches in a fetch are filtered out, the 
server returns a `filteredEndOffset` so the client can advance past the 
filtered range without re-fetching.
   
   ### Willingness to contribute
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add server-side record batch filter execution [fluss]

Reply via email to