[ https://issues.apache.org/jira/browse/GEODE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046223#comment-17046223 ]
ASF subversion and git services commented on GEODE-6536: -------------------------------------------------------- Commit 9da2cd49e2e04564b446eaad579b51e986bc2179 in geode's branch refs/heads/develop from Mario Ivanac [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9da2cd4 ] GEODE-6536: Added retry in borrowConnection/single hop (#4719) * GEODE-6536: Added retry in borrowConnection/single hop * GEODE-6536: bug fix * GEODE-6536: update after comments > client pr single hop will do multiple hops if things get busy > ------------------------------------------------------------- > > Key: GEODE-6536 > URL: https://issues.apache.org/jira/browse/GEODE-6536 > Project: Geode > Issue Type: Improvement > Components: client/server > Reporter: Darrel Schneider > Assignee: Mario Ivanac > Priority: Major > Labels: needs-review, performance, pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > I was wondering why ConnectionManagerImpl.borrowConnection that targets a > particular server will not wait for a connection to be available. This is the > borrow used by pr single hop. It seems like if all the cnxs to that server > are busy and we are at max that it should wait. But instead it does a single > scan of the pool and if it does not find an idle connection it either throws > AllConnectionsInUse or it creates one. The caller sets the onlyUseExistingCnx > parameter based on if the pool is at max cnxs or not (seems like this should > have been done in borrowConnection instead of the caller). If it throws > AllConnectionsInUse the caller catches it and ignores it and then falls into > doing a non-targeted borrow that will wait for a connection but to ANY > server. So our single hop optimization goes away on a busy client. Since it > can’t immediately get a connection to server A, it says just give me a > connection to any server sends the op to that other server and that server > ends up having to forward it to server A. You would only see this problem if > you set a max on your client connection pool. The default of -1 just says to > always create another connection. The only workaround I know of is to not > limit how many connections your client can create but that could cause other > problem (i.e. too many file descriptors). > The questionable code is in the following classes: > org.apache.geode.cache.client.internal.GetOp > org.apache.geode.cache.client.internal.PutOp > org.apache.geode.cache.client.internal.DestroyOp > The all have something like this: > > {code:java} > boolean onlyUseExistingCnx = ((poolImpl.getMaxConnections() != -1 > && poolImpl.getConnectionCount() >= poolImpl.getMaxConnections()) ? true : > false); > op.setAllowDuplicateMetadataRefresh(!onlyUseExistingCnx); > return pool.executeOn(new ServerLocation(server.getHostName(), > server.getPort()), op, > true, onlyUseExistingCnx); > {code} > executeOn eventually calls ConnectionManagerImpl.borrowConnection > It is unclear to me why they need allow duplicate metadata refresh if they > permit borrowConnection to create a new connection to the specified server. > They all catch and ignore > AllConnectionsInUseException falling through and doing non-targetted > pool.execute. > The concern might have been that the pool would be full of connections to > other servers preventing us from connecting to the server we need to > communicate with. If we have existing connections to the server we could wait > for one of those. If we don't and we are at max then we should consider > closing a connection to some other server and create a connection to the > server we know we need for this operation. As the server count goes up and > the max connection count on the pool goes down I'm sure we are probably > better off not even trying to do single hop since we might not even be able > to have a connection to each server in the pool. But if we have a reasonable > max connection count then it seems like we could make more of an effort to > make sure a per server connection count is balanced and then wait for one of > those connection to be available. -- This message was sent by Atlassian Jira (v8.3.4#803005)