ConeyLiu opened a new pull request, #7791: URL: https://github.com/apache/iceberg/pull/7791
Closes #5652 This memory leak especially happens flink sink job which could lead there are many schema and decoders cached in memory and OOM in the end. Such as the following heap dumps: <img width="1505" alt="image" src="https://github.com/apache/iceberg/assets/12733256/5c13e5a3-1bce-41bc-9967-b13bb1ec90ff"> Here are two problems at here: 1. The guava map uses identity to compare the key when using the weak key. Here are the details: https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/MapMaker.java#LL58C39-L58C39. 2. Copied from #5652 > The DecoderResolver holds a ThreadLocal variable of a two-layer map. The outer map has a weak key while the inner map has a strong one. As the inner map holds a reference to a Schema object, the outer map holding the same weak reference to the Schema object will not release the weak key. That leads to the OOM. Here, we replace the weak hashmap (based on ConcurrenHashmap) with java.util.WeakHashMap for the following reasons: 1. We already use ThreadLocal for the map, there is no need to use a ConcurrentHashmap. 2. java.util.WeakHashMap uses both identity and equality to compare keys. https://docs.oracle.com/javase/8/docs/api/java/util/WeakHashMap.html 3. Most cached we noticed are the schema of `ManifestEntry` or `ManifestFile`, both of which are subject to infrequent changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
