RyanJamesStewart opened a new issue, #22548:
URL: https://github.com/apache/datafusion/issues/22548

   ## Summary
   
   Follow-up to #22416. In `split_vec_min_alloc`, the residual vector keeps its 
original capacity across a long `split_off` chain. Under DataFusion's 
capacity-based memory accounting, that residual is charged for the full backing 
allocation until it drops, even after most of the data has been emitted.
   
   ## Background
   
   `split_vec_min_alloc` lives in `datafusion_common::utils` and is used by the 
`multi_group_by` group-value builders (`bytes.rs`, `primitive.rs`) to carve 
fixed-size chunks off a growing vector.
   
   In the `split_off` branch, the emitted prefix keeps the original capacity, 
and the residual likewise keeps its original capacity (unchanged by 
`split_off`).
   
   Concrete pathological case (raised by ariel-miculas in 
https://github.com/apache/datafusion/pull/22416#issuecomment-4520956360): a 
vector of one million elements is drained 1024 at a time. Each emitted slice 
has capacity 1024 except the final one, which inherits capacity one million via 
`split_off` + `mem::replace`. Symmetrically, on every iteration before the 
final one the residual still owns the full backing buffer, so accounting 
charges for it for the entire chain.
   
   ## Why this was not fixed in #22416
   
   #22416 considered calling `shrink_to_fit` on the emitted prefix in the 
`split_off` branch and backed it out. A caller in 
`datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes.rs` 
pushes onto the emitted prefix immediately after the call 
(`first_n_offsets.push(...)`); shrinking the prefix forces a realloc on the 
next push. The test `emitted_prefix_does_not_realloc_on_push` in 
`datafusion/common/src/utils/mod.rs` pins that constraint.
   
   The push-after-emit constraint applies to the prefix, not to the residual. 
The residual is the side that lingers; the prefix is the side that gets pushed 
onto.
   
   ## Proposed shapes
   
   Two candidates, in order of decreasing utility-side change:
   
   1. Have `split_vec_min_alloc` shrink the residual when its length is much 
smaller than its capacity (some ratio threshold). One realloc near the end of a 
long chain replaces carrying the full allocation indefinitely. Adds a heuristic 
to the shared utility.
   
   2. Push the shrink to the caller's finalize path. Callers driving long split 
chains (the `multi_group_by` builders) call `shrink_to_fit` (or `shrink_to`) on 
the residual once they know no more pushes are coming. Keeps the utility 
allocation-policy-free; each caller picks based on its own access pattern.
   
   Whichever shape lands should come with a benchmark on the 1M to 1024 case 
plus a smaller fixture to keep CI cheap, so the residual shrink can be shown 
not to regress the common short-chain path.
   
   ## Related
   
   - PR #22416 (the hoist this builds on; landed the shared utility).
   - Comment chain: 
https://github.com/apache/datafusion/pull/22416#issuecomment-4520956360 
(original observation), 
https://github.com/apache/datafusion/pull/22416#issuecomment-4545459668 (offer 
to open this follow-up).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to