ben-manes commented on PR #13382: URL: https://github.com/apache/iceberg/pull/13382#issuecomment-3006896885
Is `TableMetadataCache` single threaded like your benchmark? If so then this is what `LinkedHashMap` is optimal for since the LRU updates are simple pointer swaps. If not, then you will have to synchronize access because it is not thread-safe, where concurrent usage will cause corruption and instability. This is required for every read (per the javadoc) since every access mutates the LRU order. You also have to be thoughtful about the cache hit rate because a cache miss will be far more expensive than the in-memory operations, e.g. it will have to perform I/O. If the workload is recency-biased then LRU is perfect, but if it has frequency or scans then it can be quite poor. Caffeine is a multi-threaded cache with an adaptive eviction policy that maximizes the hit rates based on the observed workload. This does incur additional overhead but can greatly improve the overall system performance. I adjusted Caffeine's [benchmark](https://github.com/ben-manes/caffeine/blob/master/caffeine/src/jmh/java/com/github/benmanes/caffeine/cache/GetPutBenchmark.java) to run as a single threaded and with 16 threads on my 14-core M3 MAX laptop using OpenJDK 24. This uses a Zipfian distribution to simulate hot/cold items with a 100% hit rate. This was not a clean system as I was on a conference call while writing code, which only hurt Caffeine since it will utilize all of the cores. In a cloud environment you will likely observe worse throughput due to virtualization, numa effects, noisy neighbors, older hardware, etc. In general you want to drive application performance decision by profiling to resolve hotspots, as small optimizations can backfire when your benchmark does not fit your real workload. <img width="1420" alt="Screenshot 2025-06-25 at 8 07 06 PM" src="https://github.com/user-attachments/assets/b75dc8bb-c280-4c86-b8da-69f31f80e2c2" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org