ben-manes commented on PR #13382:
URL: https://github.com/apache/iceberg/pull/13382#issuecomment-3006896885

   Is `TableMetadataCache` single threaded like your benchmark? If so then this 
is what `LinkedHashMap` is optimal for since the LRU updates are simple pointer 
swaps. If not, then you will have to synchronize access because it is not 
thread-safe, where concurrent usage will cause corruption and instability. This 
is required for every read (per the javadoc) since every access mutates the LRU 
order. You also have to be thoughtful about the cache hit rate because a cache 
miss will be far more expensive than the in-memory operations, e.g. it will 
have to perform I/O. If the workload is recency-biased then LRU is perfect, but 
if it has frequency or scans then it can be quite poor.
   
   Caffeine is a multi-threaded cache with an adaptive eviction policy that 
maximizes the hit rates based on the observed workload. This does incur 
additional overhead but can greatly improve the overall system performance.
   
   I adjusted Caffeine's 
[benchmark](https://github.com/ben-manes/caffeine/blob/master/caffeine/src/jmh/java/com/github/benmanes/caffeine/cache/GetPutBenchmark.java)
 to run as a single threaded and with 16 threads on my 14-core M3 MAX laptop 
using OpenJDK 24. This uses a Zipfian distribution to simulate hot/cold items 
with a 100% hit rate. This was not a clean system as I was on a conference call 
while writing code, which only hurt Caffeine since it will utilize all of the 
cores. In a cloud environment you will likely observe worse throughput due to 
virtualization, numa effects, noisy neighbors, older hardware, etc. In general 
you want to drive application performance decision by profiling to resolve 
hotspots, as small optimizations can backfire when your benchmark does not fit 
your real workload.
   
   <img width="1420" alt="Screenshot 2025-06-25 at 8 07 06 PM" 
src="https://github.com/user-attachments/assets/b75dc8bb-c280-4c86-b8da-69f31f80e2c2";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to