rajarshisarkar commented on PR #7194:
URL: https://github.com/apache/iceberg/pull/7194#issuecomment-1490261743

   > One of the problems with the proposed approach is that optimizations are 
being triggered as an immediate result of a commit. The implication is that 
whatever happens in the metric report consumer needs to happen in a way that 
doesn't affect the commit path. For example, failures in the consumer should 
not lead to commit failures.
   Additionally, every single commit triggers additional workload, so I think 
consuming a metrics report and actually performing some workload should be 
completely decoupled from one another.
   
   This is an opt-in feature and would be helpful in scenarios where the users 
would not like to maintain different optimisations as scheduled pipelines. This 
feature would actually take away the operational overhead from the users in 
terms of maintaining the extra pipelines. Yes, the consumer should not affect 
the commit path (for the incoming commits) which makes it suitable for batch 
workloads. Regarding additional workload, every commit would just do some basic 
threshold checks on the table history only when the user opts-in for auto 
optimisation. We can arrange the thresholds in way that the quickest threshold 
checks are done earlier so that we exit early, if possible. As the approach is 
suitable for batch workloads so the user shouldn't mind this latency after the 
commit. Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to