[ https://issues.apache.org/jira/browse/GEODE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385711#comment-17385711 ]
Darrel Schneider commented on GEODE-9365: ----------------------------------------- Consider finding a way to optimize checkQueueSizeConstraint for the normal case of the queue not being full. I'm a bit concerned with the normal case now always making a system call to figure out the current time. What if we made "putPermits" an AtomicInteger that checkQueueSizeConstraint first calls (without any sync) putPermits.decrementAndGet on. If it returns a value < 0 then it drops into the permitMon code and can adjust the wait time depending on elapsed time. But if it is >= 0 then it is done and has not done any syncs or time calls. Keep in mind that all reconcilePutPermits does is make the putPermits atomic bigger. It will never make it smaller and it just returns the current value of putPermits. Currently checkQueueSizeConstraint decs putPermits at the end for the item it will be adding but I don't see anything wrong with it doing it at the start since it will be atomic and all other changes are done in checkQueueSizeConstraint while synced on permitMon. The only code that ever increases putPermits is reconcilePutPermits and this is done lazily once putPermits is < 0. Having all the threads concurrently calling checkQueueSizeConstraint do an atomic dec seems much better than them doing a sync on putGuard. > HARegionQueue over throttles when multiple threads attempt concurrent adds > -------------------------------------------------------------------------- > > Key: GEODE-9365 > URL: https://issues.apache.org/jira/browse/GEODE-9365 > Project: Geode > Issue Type: Bug > Components: client queues > Reporter: Darrel Schneider > Assignee: Mark Hanson > Priority: Major > Labels: GeodeOperationAPI > > HARegionQueue.checkQueueSizeConstraint has some code that implements a > "throttle" on adds to a queue that is full. It is supposed to wait > "eventEnqueueWaitTime" before doing an add. But because this code does two > syncs (putGuard and permitMon) and only waits on one of them, it holds the > other sync for the duration of this threads throttle. Any other concurrent > thread trying to add to the queue gets stuck on the putGuard sync that is > held by the first thread that is doing the timed wait. So it ends up waiting > "eventEnqueueWaitTime" to acquire the first sync and then ends up waiting > again "eventEnqueueWaitTime" when it does its own timed wait. If you have 10 > concurrent threads trying to add one of them will end up waiting 10 * > "eventEnqueueWaitTime". > A couple ideas of how to fix this. Get rid of the putGuard and just use > permitMon. Then as soon as the first thread goes into its timed wait another > thread is allowed to sync on permitMon. But if this is done then we need to > think carefully about the code inside this sync block since it can not be > executed while one or more other threads are waiting in permitMon. > The other solution would be to compute the elapsed time it took to get into > the first sync and subtract that from the time we wait on permitMon. This > seems like a simple solution but does introduce at least one call of get time > (the second call is only needed if the queue is full). -- This message was sent by Atlassian Jira (v8.3.4#803005)