Svnserve scalability problems in multi-threaded mode (-T)

2020-04-18 Thread Krotil, Radek
Hi Everyone.

Let me revisit this old topic that has been discussed in 2016 with the 
Subversion team. As our tool finally added support for SVN 1.10 version, we 
have a breakthrough in this issue after all these years. And I feel it is 
something the Subversion team should be aware of.

There are two topics I’d like to bring up here, as they are inter-connected:

1) Svnserve causing mutex lock contention in threaded mode

  *   Discussed in 
http://subversion.1072662.n5.nabble.com/Better-choice-for-Linux-semaphore-than-spinlock-td204915.html#a204989
  *   This report I believe originates at our enterprise customer, who has seen 
this behavior at high concurrent usage. Simply when svnserve on Linux (RHEL 
7.6) is configured to run in threaded mode, then we start seeing the following 
pattern. All the CPU time is consumed by the concurrent usable. In the worst 
case that we have ever seen, almost all the CPU is consumed by system time, 
presumably related to that spinlock contention discussed in the thread above.
  *   According to our developer analysis, svnserve behaves very differently 
depending on the fact if there are enough threads in the svnserve pool or not. 
Quoting our dev: “Svnserve waits on socket read if there is enough threads in 
pool. But it behaves a bit differently if more than half of threads from pool 
are occupied by work. In that case, it immediately returns thread after each 
command/operation back to the pool which is again trying to get out of pool as 
there is a lot of work to do – and this is point of locking. It does also 
processing using round-robin, which will intentionally prolong connection 
operations I think, trying to reduce load back to normal state. Otherwise, if 
there are enough threads in pool (lets say by default under 128 threads), these 
active threads are not returned back to pool after each command and just 
processing next commands in command queue for given connection and it is fully 
concurrent without lock.”
  *   So now when we finally added support for SVN 1.10 that added new 
configuration options for tuning the number of the threads, we were able to do 
more experiments based on Stefan Fuhrmann’s 
recommendation.
 When we apply the recommended tuning options –min-threads 64 –max-threads 
1024, the situation improves significantly. See the figures below.

Svnserve in threaded mode – no threads tuning
[cid:image002.jpg@01D61561.8913B730]

Svnserve in threaded mode – no threads tuning, worst case
[cid:image008.jpg@01D61561.8913B730]

Svnserve in threaded mode – tuned –min-threads 64 –max-threads 1024
[cid:image009.jpg@01D61561.8913B730]


  *   Conclusion here is that svnserve from version 1.10 can be configured to 
support the necessary concurrency, but there is lack of guidance and potential 
logging that can lead admins to proper configuration. So bring it up here to 
your consideration, if you want to process this feedback.

2) Deadlock-like behaviour of svnserve in multi-threaded mode (-T)

  *   The second problem is also related to threaded mode and we attacked this 
for the third time as it was significant robustness problem that caused 
stalling of our application with hundreds of concurrent users and therefore was 
escalated by our enterprise customers
  *   Discussed in 
http://subversion.1072662.n5.nabble.com/Deadlock-like-behaviour-of-svnserve-in-multi-threaded-mode-T-tt196421.html#a196500
 and also tracked as https://issues.apache.org/jira/browse/SVN-4626

  *   Our scenario that leads to this problem is the following: Our tool, 
Polarion ALM, at times performs a re-indexing operation, where it pulls a lot 
of data from SVN in parallel connections. Also it is being used by hundreds of 
concurrent users and at times, also concurrent usage and subsequent connection 
creation leads to this problem. The newly created connections to svnserve stall 
completely and due to internal locking in Polarion, all communication to our 
backend stops until a timeout on the connection occurs minutes later.
  *   This stalling occurs when multiple SVN connections are opened at the same 
time and only when SVN is running in threaded mode. This is default on Windows, 
and can be enabled by configuration on Linux.
  *   We involved the svnkit team in the latest analysis and Alex Kitaev 
provided very good help. Let me again quote Alex: “I started to reduce number 
of parallel threads and when issue was reproducible with even two threads I've 
realized that the problem might be related to socket.connect call rate. 
Somehow, frequent connection establishment led to failues - connection state 
was displayed as Established, but no data was read from it. So, the workaround 
I've found so far, is to make sure SVNRepository instances are created 
subsequently along with "testConnection" call on the insance, with minimal 
delay between "testConnection

Re: Svnserve scalability problems in multi-threaded mode (-T)

2020-04-18 Thread Kenneth Porter
--On Saturday, April 18, 2020 8:12 AM + "Krotil, Radek" 
 wrote:



In conclusion – currently we are able to overcome both of these long
standing problems after adding support for SVN 1.10. We just wanted to
share our findings, so that the SVN team is aware of them. We understand
our usage of SVN is bit special, but I feel our findings may help making
the SVN bit better and prevent problems at other users.


I don't have anything to contribute to the issue but I wanted to say that I 
really enjoyed this tale of sleuthing. A really good debugging story!