[ 
https://issues.apache.org/jira/browse/GEODE-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233977#comment-17233977
 ] 

ASF subversion and git services commented on GEODE-8553:
--------------------------------------------------------

Commit 72455cd0811bf793a8680b839bbc3ed957ad1830 in geode-native's branch 
refs/heads/develop from Mario Salazar de Torres
[ https://gitbox.apache.org/repos/asf?p=geode-native.git;h=72455cd ]

GEODE-8553: Fix inter-locks in ThinClientLocator (#660)

 - ThinClientLocatorHelper uses a mutex called m_locatorLock, which is
   used in the entire scope of all the public member functions for this
   class.
 - Problem with that is all of those functions perform some network
   communication and in the event of networking issues, this will result
   in having all the application threads inter-locked.
 - Also, the only purpose of this mutex is to avoid that the locators
   list is read while it's being updated.
 - Given all of the above, the class has been refactored, so every time
   the locators list has to be accessed, a copy of it is created, being
   that copy created only while owning the mutex.
 - And for the case of the update, a new locators list is composed and
   its only at the end of the update function where the mutex is locked
   and the locators list swapped by the new one.
 - This whole change ensures that the time the mutex is in use is
   minimal while the functionality remains intact.
 - Refactored ConnectionWrapper so is move constructible.
 - Locators list is now initialized using iterator range constructor.
 - Added getConnRetries utility function.
 - Changed mutex for a shared mutex.
 - getLocators function now returns a shuffled list each time.
 - Cleaned-up repeated code. As a result of this createConnection has been
   refactored and a new function called sendRequest has been created.
 - Implemented proposal for locator connections retries, treating the locators
   list as a circular buffer.
 - Added new integration test to verify that threads don't inter-lock anymore.
 - The test works as follow:
   * Set 5s connect-timeout.
   * Deploy 1 locator.
   * Add 1 non-existent locator to the pool.
   * Send 64 concurrent requests to get the server list.
   * By probability if there are no interlocks ~50%+-E% of the requests should
     have completed in a brief period of time.
   * Note that E% is the error allowance percentage, which is set to 10%.
   * Therefore a 10 seconds timeout is set in order to reach the request
     completed treshold.
   * Test passes if the number of completed requests before the timeout was
     greater or equal to the thresholds, otherwise it fails.

> Reduce ThinClientLocatorHelper lock time
> ----------------------------------------
>
>                 Key: GEODE-8553
>                 URL: https://issues.apache.org/jira/browse/GEODE-8553
>             Project: Geode
>          Issue Type: Improvement
>          Components: native client
>    Affects Versions: 1.12.0, 1.13.0
>            Reporter: Mario Salazar de Torres
>            Assignee: Mario Salazar de Torres
>            Priority: Major
>              Labels: pull-request-available
>
> During my troublshootings, I've seen that locking m_locatorLock for the whole 
> scope of the class function members might cause some inter-locks.
> Problem here and in many other places of the NC is that networking operations 
> are performed while a mutex is locked. Therefore if *thread A* takes longer 
> than expected in performing its network operation, it might block another one 
> which does not requires any resource of the first *thread A*. Hence, the 
> inter-lock.
> This improvement is the first one of a series regarding to lock scope 
> reduction when it comes with code regarding networking in NC.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to