2010YOUY01 commented on PR #22729:
URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4621939197
> Thank you @2010YOUY01 -- this looks like a good idea, but I am slightly
confused
>
> 1. The PR says it "splits" logic but I actually see two entirely new
streams (and no reduction / refactoring of the existing streams)
> 2. I am not sure how the two new streams are related to the existing stream
@alamb Thank you for the review!
The idea is incremental migration: we have to keep some duplication during
the migration, but eventually the old `GroupedHashAggregateStream` can be
deleted entirely.
```rust
// Before
AggregateExec::execute() {
// One multiplexed stream handles all paths.
GroupedHashAggregateStream::new(...)
}
// Incremental migration process
AggregateExec::execute() {
match self.plan_stream() {
// Migrated path 1/5
StreamPlan::Partial => PartialHashAggregateStream::new(...),
// Migrated path 2/5
StreamPlan::Final => FinalHashAggregateStream::new(...),
// Paths not migrated yet still use the old implementation.
_ => GroupedHashAggregateStream::new(...),
}
}
// After all paths are migrated
AggregateExec::execute() {
match self.plan_stream() {
StreamPlan::Partial => PartialHashAggregateStream::new(...),
StreamPlan::Final => FinalHashAggregateStream::new(...),
// Migrated path 3/5
// ...
// No fallback remains; GroupedHashAggregateStream can be deleted.
}
}
```
The reasons for this approach are:
- Ideally, we would remove the migrated paths from
`GroupedHashAggregateStream` as we go. However, I have found that code
particularly difficult to modify safely.
- This incremental migration minimizes risk by keeping the existing
implementation as a fallback while new paths are introduced. Once all paths
have been migrated, we can delete the old implementation and validate the
result with the existing test suite. The main challenge is that it is difficult
to test each migrated path in isolation, and I can't think about a good idea to
address that, so I expect the final deletion step to require careful review and
validation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]