aokolnychyi opened a new pull request, #8157: URL: https://github.com/apache/iceberg/pull/8157
This PR improves and refactors `DeleteFileIndex`. - Avoid the cost of repeated conversion of min/max boundaries that damaged the index lookup performance. - Use `dataSequenceNumber` from `ContentFile` instead of `ManifestEntry` to support distributed planning in the future. This change relies on existing tests and adds a new benchmark. Results prior to this change: ``` Benchmark Mode Cnt Score Error Units PlanningBenchmark.localPlanningWithMinMaxFilter ss 5 9.740 ± 2.333 s/op PlanningBenchmark.localPlanningWithPartitionAndMinMaxFilter ss 5 3.008 ± 0.044 s/op PlanningBenchmark.localPlanningWithoutFilter ss 5 9.569 ± 1.309 s/op ``` Results after this change: ``` Benchmark Mode Cnt Score Error Units PlanningBenchmark.localPlanningWithMinMaxFilter ss 5 5.618 ± 1.297 s/op PlanningBenchmark.localPlanningWithPartitionAndMinMaxFilter ss 5 2.668 ± 0.210 s/op PlanningBenchmark.localPlanningWithoutFilter ss 5 5.699 ± 0.595 s/op ``` This would be even more important for equality deletes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
