aokolnychyi opened a new pull request, #9126: URL: https://github.com/apache/iceberg/pull/9126
I am using `CharSequenceMap` in #8755 to build a map of position delete indexes for a delete file. While profiling that change, I noticed we spend quite a bit of time computing hash codes for file paths compared to what `String` would do. This is because our logic in `CharSequenceWrapper` is generic while `String` can compute a hash code for latin only chars faster by iterating over bytes directly. This PR optimizes the hash code computation for file paths by only taking into account file names. This speeds up the computation without increasing the chances of collisions. This PR comes with tests and a benchmark. ``` Benchmark Mode Cnt Score Error Units CharSequenceMapBenchmark.defaultCharSequenceMap ss 10 2.742 ± 0.499 s/op CharSequenceMapBenchmark.filePathCharSequenceMap ss 10 1.420 ± 0.234 s/op ``` I am planning to use `CharSequenceMap` in `DeleteFileIndex` so this will be a common pattern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org