[ 
https://issues.apache.org/jira/browse/GEODE-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205056#comment-17205056
 ] 

ASF GitHub Bot commented on GEODE-8553:
---------------------------------------

mreddington commented on a change in pull request #660:
URL: https://github.com/apache/geode-native/pull/660#discussion_r497826366



##########
File path: cppcache/src/ThinClientLocatorHelper.cpp
##########
@@ -174,30 +176,25 @@ GfErrType 
ThinClientLocatorHelper::getEndpointForNewCallBackConn(
     ClientProxyMembershipID& memId, std::list<ServerLocation>& outEndpoint,
     std::string&, int redundancy, const std::set<ServerLocation>& exclEndPts,
     const std::string& serverGrp) {
-  std::lock_guard<decltype(m_locatorLock)> guard(m_locatorLock);
   auto& sysProps = m_poolDM->getConnectionManager()
                        .getCacheImpl()
                        ->getDistributedSystem()
                        .getSystemProperties();
-  int locatorsRetry = 3;
-  if (m_poolDM) {
-    int poolRetry = m_poolDM->getRetryAttempts();
-    locatorsRetry = poolRetry <= 0 ? locatorsRetry : poolRetry;
-  }
+
+  auto poolRetry = m_poolDM->getRetryAttempts();
+  auto locatorsRetry = poolRetry <= 0 ? 3 : poolRetry;
+
   LOGFINER(
       "ThinClientLocatorHelper::getEndpointForNewCallBackConn locatorsRetry = "
       "%d ",
       locatorsRetry);
-  for (unsigned attempts = 0;
-       attempts <
-       (m_locHostPort.size() == 1 ? locatorsRetry : m_locHostPort.size());
-       attempts++) {
-    ServerLocation loc;
-    if (m_locHostPort.size() == 1) {
-      loc = m_locHostPort[0];
-    } else {
-      loc = m_locHostPort[attempts];
-    }
+
+  auto locators = getLocators();
+  auto locatorsSize = locators.size();
+  auto maxAttempts = locatorsSize == 1 ? locatorsRetry : locatorsSize;
+
+  for (auto attempts = 0ULL; attempts < maxAttempts; ++attempts) {
+    const auto& loc = locatorsSize == 1 ? locators[0] : locators[attempts];

Review comment:
       This logic is flawed. What if maxAttempts is 3 and locatorsSize is 2? 
What this code says is if there is 1 locator, all attempts are on that locator, 
otherwise, make one attempt per each locator. I suggest - unless you have a 
better idea, the logic should be per each attempt, try all locators until one 
connection is successful. By moving the `random_shuffle` into `getLocators`, 
the order will be randomized every time, which the NC doesn't make any 
guarantee about locator order, so this seems ideal.
   
   I recommend a helper function like `try_n(functor, count)`. The functor 
should return some GfErrType value. The utility function will loop until 
success is indicated by the functor or it runs out of attempts, and either 
throws an exception upon failure or returns that GfErrType - however you want 
to handle the fail case...
   
   Write a function `connect_to_a_locator` contains the loop logic over the 
locators and the connect attempts and returns GfErrType, which you can bind, or 
it can return a lambda that does the same thing. Suggestion #1:
   
       GfErrType ThinClientLocatorHelper::connect_to_a_locator() {
         /* business logic */
       }
   
   Suggestion #2:
   
       auto ThinClientLocatorHelper::connect_to_a_locator() {
         return [&]() -> GfErrType {
           /* business logic */
         };
       }
   
   The code would then look like:
   
       try_n(::std::bind(ThinClientLocatorHelper::connect_to_a_locator, this), 
maxAttempts);
   
   Or:
   
       try_n(connect_to_a_locator(), maxAttempts);
   
   Maybe checking a return type, maybe in a try/catch block.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce ThinClientLocatorHelper lock time
> ----------------------------------------
>
>                 Key: GEODE-8553
>                 URL: https://issues.apache.org/jira/browse/GEODE-8553
>             Project: Geode
>          Issue Type: Improvement
>          Components: native client
>    Affects Versions: 1.12.0, 1.13.0
>            Reporter: Mario Salazar de Torres
>            Assignee: Mario Salazar de Torres
>            Priority: Major
>              Labels: pull-request-available
>
> During my troublshootings, I've seen that locking m_locatorLock for the whole 
> scope of the class function members might cause some inter-locks.
> Problem here and in many other places of the NC is that networking operations 
> are performed while a mutex is locked. Therefore if *thread A* takes longer 
> than expected in performing its network operation, it might block another one 
> which does not requires any resource of the first *thread A*. Hence, the 
> inter-lock.
> This improvement is the first one of a series regarding to lock scope 
> reduction when it comes with code regarding networking in NC.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to