2010YOUY01 commented on PR #22729:
URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4621939197

   > Thank you @2010YOUY01 -- this looks like a good idea, but I am slightly 
confused
   > 
   > 1. The PR says it "splits" logic but I actually see two entirely new 
streams (and no reduction / refactoring of the existing streams)
   > 2. I am not sure how the two new streams are related to the existing stream
   
   @alamb Thank you for the review!
   
   The idea is incremental migration: we have to keep some duplication during 
the migration, but eventually the old `GroupedHashAggregateStream` can be 
deleted entirely.
   
   ```rust
   // Before
   AggregateExec::execute() {
       // One multiplexed stream handles all paths.
       GroupedHashAggregateStream::new(...)
   }
   
   // Incremental migration process
   AggregateExec::execute() {
       match self.plan_stream() {
           // Migrated path 1/5
           StreamPlan::Partial => PartialHashAggregateStream::new(...),
   
           // Migrated path 2/5
           StreamPlan::Final => FinalHashAggregateStream::new(...),
   
           // Paths not migrated yet still use the old implementation.
           _ => GroupedHashAggregateStream::new(...),
       }
   }
   
   // After all paths are migrated
   AggregateExec::execute() {
       match self.plan_stream() {
           StreamPlan::Partial => PartialHashAggregateStream::new(...),
           StreamPlan::Final => FinalHashAggregateStream::new(...),
   
           // Migrated path 3/5
           // ...
   
           // No fallback remains; GroupedHashAggregateStream can be deleted.
       }
   }
   ```
   
   The reasons for this approach are:
   - Ideally, we would remove the migrated paths from 
`GroupedHashAggregateStream` as we go. However, I have found that code 
particularly difficult to modify safely.
   - This incremental migration minimizes risk by keeping the existing 
implementation as a fallback while new paths are introduced. Once all paths 
have been migrated, we can delete the old implementation and validate the 
result with the existing test suite. The main challenge is that it is difficult 
to test each migrated path in isolation, and I can't think about a good idea to 
address that, so I expect the final deletion step to require careful review and 
validation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to