aokolnychyi opened a new pull request, #8157:
URL: https://github.com/apache/iceberg/pull/8157

   This PR improves and refactors `DeleteFileIndex`.
   
   - Avoid the cost of repeated conversion of min/max boundaries that damaged 
the index lookup performance.
   - Use `dataSequenceNumber` from `ContentFile` instead of `ManifestEntry` to 
support distributed planning in the future.
   
   This change relies on existing tests and adds a new benchmark.
   
   Results prior to this change:
   ```
   Benchmark                                                    Mode  Cnt  
Score   Error  Units
   PlanningBenchmark.localPlanningWithMinMaxFilter                ss    5  
9.740 ± 2.333   s/op
   PlanningBenchmark.localPlanningWithPartitionAndMinMaxFilter    ss    5  
3.008 ± 0.044   s/op
   PlanningBenchmark.localPlanningWithoutFilter                   ss    5  
9.569 ± 1.309   s/op
   ```
   
   Results after this change:
   ```
   Benchmark                                                    Mode  Cnt  
Score   Error  Units
   PlanningBenchmark.localPlanningWithMinMaxFilter                ss    5  
5.618 ± 1.297   s/op
   PlanningBenchmark.localPlanningWithPartitionAndMinMaxFilter    ss    5  
2.668 ± 0.210   s/op
   PlanningBenchmark.localPlanningWithoutFilter                   ss    5  
5.699 ± 0.595   s/op
   ```
   
   This would be even more important for equality deletes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to