[jira] [Updated] (LUCENE-10519) Improvement for CloseableThreadLocal

Lucifer Boice (Jira) Tue, 26 Apr 2022 19:05:04 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lucifer Boice updated LUCENE-10519:
-----------------------------------
    Summary: Improvement for CloseableThreadLocal  (was: ThreadLocal.remove 
under G1GC takes 100% CPU)

> Improvement for CloseableThreadLocal
> ------------------------------------
>
>                 Key: LUCENE-10519
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10519
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 8.9, 8.10.1, 8.11.1
>         Environment: Elasticsearch v7.16.0
> OpenJDK v11
>            Reporter: Lucifer Boice
>            Priority: Critical
>              Labels: CloseableThreadLocal
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> h2. Problem
> ----
> {*}org.apache.lucene.util.CloseableThreadLocal{*}(which is using 
> {*}ThreadLocal<WeakReference<T>>{*}) may still have a flaw under G1GC. There 
> is a single ThreadLocalMap stored for each thread, which all ThreadLocals 
> share, and that master map only periodically purges stale entries. When we 
> close a CloseableThreadLocal, we only take care of the current thread right 
> now, others will be taken care of via the WeakReferences. Under G1GC, the 
> WeakReferences of other threads may not be recycled even after several rounds 
> of mix-GC. The ThreadLocalMap may grow very large, it can take an arbitrarily 
> long amount of CPU and time to iterate the things you had stored in it.
> Hot thread of elasticsearch:
> {code:java}
> ::: 
> {xxxxxxxxx}{lCj7LcVnT328KHcJRd57yg}{WPiNCbk0R0SIKxg4-w3wew}{xxxxxxxx}{xxxxxxxx}
>    Hot threads at 2020-04-12T05:25:10.224Z, interval=500ms, busiestThreads=3, 
> ignoreIdleThreads=true:
>    
>    105.3% (526.5ms out of 500ms) cpu usage by thread 
> 'elasticsearch[xxxxxxxx][bulk][T#31]'
>      10/10 snapshots sharing following 34 elements
>        
> java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:627)
>        java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:509)
>        java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:308)
>        java.lang.ThreadLocal.remove(ThreadLocal.java:224)
>        
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
>        
> java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1349)
>        
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
>        
> org.elasticsearch.common.util.concurrent.ReleasableLock.close(ReleasableLock.java:49)
>        
> org.elasticsearch.index.engine.InternalEngine.$closeResource(InternalEngine.java:356)
>        
> org.elasticsearch.index.engine.InternalEngine.delete(InternalEngine.java:1272)
>        org.elasticsearch.index.shard.IndexShard.delete(IndexShard.java:812)
>        
> org.elasticsearch.index.shard.IndexShard.applyDeleteOperation(IndexShard.java:779)
>        
> org.elasticsearch.index.shard.IndexShard.applyDeleteOperationOnReplica(IndexShard.java:750)
>        
> org.elasticsearch.action.bulk.TransportShardBulkAction.performOpOnReplica(TransportShardBulkAction.java:623)
>        
> org.elasticsearch.action.bulk.TransportShardBulkAction.performOnReplica(TransportShardBulkAction.java:577)
>  {code}
> h2. Solution
> ----
> This bug does not reproduce under CMS. It can be reproduced under G1GC always.
> In fact, *CloseableThreadLocal* doesn't need to store entry twice in the 
> hardRefs And ThreadLocals. Remove ThreadLocal from CloseableThreadLocal so 
> that we would not be affected by the serious flaw of Java's built-in 
> ThreadLocal. 
> h2. See also
> ----
> [https://github.com/elastic/elasticsearch/issues/56766]
> [https://bugs.openjdk.java.net/browse/JDK-8182982]
> [https://discuss.elastic.co/t/indexing-performance-degrading-over-time/40229/44]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10519) Improvement for CloseableThreadLocal

Reply via email to