mbutrovich commented on PR #21629: URL: https://github.com/apache/datafusion/pull/21629#issuecomment-4262519681
More brainstorming with Claude: > ### Proposed fix > > Don't coalesce schemas that have variable-length non-view columns (StringArray, BinaryArray, LargeStringArray, LargeBinaryArray, DictionaryArray with string/binary values). For these schemas, set the coalesce target to `batch_size` instead of `sort_coalesce_target_rows`: > > - Full-size batches pass through uncoalesced and sort individually, same as `main` — no regression > - Small batches (from filters, joins, pushdown) coalesce up to `batch_size` only — some fan-in reduction without blowing cache > - StringView and fixed-width schemas keep the full 32K coalesce target and retain all speedups > > Fan-in reduction is the price we pay for StringArray schemas. We can't optimize the merge if the sort itself is 2.5x slower getting there. > > Comet does not support StringView — all string columns use StringArray, so this regression directly affects Comet workloads. > > ### Why check all columns, not just sort keys? > > `take` reorders all columns in the batch, not just sort keys. Wide schemas (e.g. small string key + 70KB string value) are especially vulnerable — coalescing to 32K means `take` scatter-gathers across ~2.2GB of value data. > > ### Alternatives considered > > **Smaller intermediate target (e.g. 16K):** Any coalescing beyond `batch_size` amplifies cache pressure for wide schemas proportionally. No clear sweet spot. > > **Convert StringArray → StringView before take:** The gather (conversion) is cheap and `take` on views is cheap, but the resulting views point to the original unsorted buffer. No memory freed until the entire buffer drops — ~2x memory bloat, wrong tradeoff for a spilling sort. > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
