On Sat, 2011-07-09 at 15:02 +0200, Stefan Fuhrmann wrote: > On 08.07.2011 01:56, Daniel Shahaf wrote: > > FYI from users@ > > > > ----- Forwarded message from Tony Butt<tony.b...@cea.com.au> ----- > > Date: Wed, 6 Jul 2011 15:20:27 +1000 > >> We are running subversion 1.6.17 on a vmware hosted server. We recently > >> reconfigured the server to give 4 virtual CPUs (up from 1), and a > >> significant amount of memory. > >> > >> In order to spruce up our performance a little, I looked into the use of > >> memcached with subversion again, found the correct config parameter, and > >> set it up. Our server is running Ubuntu 10.04, Apache 2.2. Access > >> mechanism is http (of course). The client used is running Ubuntu 11.04, > >> and svn commandline (1.6.17 also) > >> > >> The results were interesting, to say the least. > >> > And the sad thing that these results are in line with > what can be expected in 1.6. That's why the whole > caching code has been reworked in 1.7. > >> Checkout of a tree, about 250M in size: > >> > >> Without memcached, 1 1/2 to 2 minutes, varies with server load > >> With memcached, 12 minutes (!) > >> > >> Update of the same tree, > >> Without memcached, 9 seconds > >> With memcached, 14 seconds - repeated several times, similar results. > You can expect all similarly structured repositories > to show similar performance patterns. Only for very > different content and usage patterns, there might be > a performance improvement (see below). > >> I am not sure what anyone else's experience is, but we will not be > >> enabling memcached for subversion any time soon. > I will try to answer that with some indication towards > what you may try and when that might aid performance > in 1.6 and 1.7. But first, let me give you some technical > background because there is obviously no simple > recipe for making things fast in 1.6. > > The key factors here is latency and trade-off. To read > a userfile@rev from the repository, the back-end has to > follow a short chain of objects (roughly: rev->offset in repo > file, userfile -> last change, last change -> offset in repo file, > chain of deltas to combine). All that data will ultimately > come from disk. > > Most servers boast large amounts of file system cache > that can be accessed in < 0.1ms. SVN itself will cache > the index information used at the beginning of the lookup > chain in in its application memory. By default, only the > user file deltas need to be read (from file system cache), > decompressed and combined into the original content. > > For typical files < 100k, only 5 or less of these delta > blocks need to be read while the index / admin info > at the beginning of the lookup chain contains also about > 5 steps which can often be satisfied directly from internal > caches, i.e. are "for free". So, the default in 1.6 is < 1ms > for reading the data plus come CPU load from unzipping it. > > With memcached, the picture changes and parts of that > can be considered a design flaw in 1.6. All index objects > will be stored in the memcached, i.e. accessing them is > no longer for free. OTOH, reconstructed user-file content > will no be cached, i.e. no need to reconstruct it from > deltas over and over again. So, we traded 3 or 4 file > cache reads plus unzip CPU load for ~5 memcached reads. > > But the latter involves a TCP/IP communication between > processes with latencies 2+ times that of the file system > cache. To make things worse, memcached seems to > shut off for a few seconds when being hammered with > a large number of requests in a short period of time > (I observed that behavior under Ubuntu 9.04). To mitigate > that, i.e. have *some* process to answer your requests, > start 3 or 4 of them. They all will end up with redundant > information. > Without understanding the details, that is similar to what I thought must be happening. Since it is fairly simple to switch in/out memcached, I will try again when we go to 1.7.
Thanks for the thoughtful response - Tony Butt > So, when can memcached be useful in 1.6? > > * when the file system cache in ineffective (repositories > on NFS-like shares) > * disk (NAS) latency is higher than TCP/IP latency > * large external memcached servers are available > compared to usable file system cache on the SVN > server machine > * huge repositories where the combined amount of > frequently requested information is larger than what > the file system cache can buffer. > > For 1.7, things are quite different. Memcached will only > be used for user file content - not the many admin objects > needed to access it. Hence, the trade-off should always > be 1 TCP/IP lookup vs. multiple file cache accesses. > > Moreover, the svn server itself can cache those full texts > - effectively eliminating all latencies. Combined with > many improvements to the caching logic, all c/o > operations should be strictly limited by client I/O. > > Hope that lengthy explanation helps! > > -- Stefan^2. -- Tony Butt <t...@cea.com.au> CEA Technologies