ConeyLiu opened a new pull request, #7791:
URL: https://github.com/apache/iceberg/pull/7791

   Closes #5652
   
   This memory leak especially happens flink sink job which could lead there 
are many schema and decoders cached in memory and OOM in the end. Such as the 
following heap dumps:
   
   <img width="1505" alt="image" 
src="https://github.com/apache/iceberg/assets/12733256/5c13e5a3-1bce-41bc-9967-b13bb1ec90ff";>
   
   
   Here are two problems at here:
   1.  The guava map uses identity to compare the key when using the weak key. 
Here are the details: 
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/MapMaker.java#LL58C39-L58C39.
 
   2. Copied from #5652 
   
   > The DecoderResolver holds a ThreadLocal variable of a two-layer map. The 
outer map has a weak key while the inner map has a strong one. As the inner map 
holds a reference to a Schema object, the outer map holding the same weak 
reference to the Schema object will not release the weak key. That leads to the 
OOM.
   
   Here, we replace the weak hashmap (based on ConcurrenHashmap) with 
java.util.WeakHashMap for the following reasons:
   1. We already use ThreadLocal for the map, there is no need to use a 
ConcurrentHashmap.
   2. java.util.WeakHashMap uses both identity and equality to compare keys. 
https://docs.oracle.com/javase/8/docs/api/java/util/WeakHashMap.html
   3. Most cached we noticed are the schema of `ManifestEntry` or 
`ManifestFile`, both of which are subject to infrequent changes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to