Hi SVN Community. I’d like to consult a problem with occasional high CPU utilization by mod_authz_svn. Please, do CC me and my colleague in the response, because we are subscribed to the list.
Our Java application uses Subversion as a primary data storage and the SVN data are being accessed via Apache HTTPD mod_dav_svn and also by svnserve server on top of the same repository. We are facing a problem in one of our productive environments hosted in AWS. The server is running RHEL 8.8, Apache 2.4.37 with SVN 1.14.1 binaries. Occasionally, the Apache HTTPD is acting strange, utilizing 100% CPU for periods of minutes to tens of minutes. This situation happens several times a week and causes delayed responses from the SVN server up to several minutes. Similar behavior is visible on several httpd server processes (mpm_event module is used), where each of them has tens of active requests being processed. When this CPU spike occurs, there several Apache HTTPD server processes with tens of threads busy in the following code related to the Subversion ACL. The access file is quite big. It contains about 20000 group definitions with roughly 10000 users assigned to them and about 35000 authorization rules for 15000 paths, and it’s size is over 5 MB. The access file is getting changed 100 times a day, but the CPU spikes do not correlate with the CPU spikes. Initially we suspected filesystem issues, but we were not able to confirm them. When the spike occurs, CPU I/O wait goes to zero and all the CPU is consumed in user space. We are looking for suggestions how to further troubleshoot the problem, or possible confirmation that this is a Subversion bug or inefficiency. Thank you, Radek Krotil Siemens Digital Industries Software Pstack output example Thread 127 (Thread 0x7f0003fcf700 (LWP 2934729)): #0 0x00007f0049b79b60 in svn_authz.get_acl_access () from target:/lib64/libsvn_repos-1.so.0 #1 0x00007f0049b7a864 in update_user_rights () from target:/lib64/libsvn_repos-1.so.0 #2 0x00007f0048e35d06 in hash_do_callback () from target:/lib64/libsvn_subr-1.so.0 #3 0x00007f0058bc8ebe in apr_hash_do () from target:/lib64/libapr-1.so.0 #4 0x00007f0048e35d6a in svn_iter_apr_hash () from target:/lib64/libsvn_subr-1.so.0 #5 0x00007f0049b7a7b4 in expand_acl_callback () from target:/lib64/libsvn_repos-1.so.0 #6 0x00007f0048e35e41 in svn_iter_apr_array () from target:/lib64/libsvn_subr-1.so.0 #7 0x00007f0049b7c109 in svn_authz.parse () from target:/lib64/libsvn_repos-1.so.0 #8 0x00007f0049b78f74 in svn_repos_authz_read4 () from target:/lib64/libsvn_repos-1.so.0 #9 0x00007f004845e4a3 in get_access_conf () from target:/etc/httpd/modules/mod_authz_svn.so #10 0x00007f004845e705 in req_check_access () from target:/etc/httpd/modules/mod_authz_svn.so #11 0x00007f004845efd6 in auth_checker () from target:/etc/httpd/modules/mod_authz_svn.so #12 0x0000561c8b0724c8 in ap_run_auth_checker () #13 0x0000561c8b074a9c in ap_process_request_internal () #14 0x0000561c8b093a70 in ap_process_async_request () #15 0x0000561c8b08fed0 in ap_process_http_connection () #16 0x0000561c8b0862b8 in ap_run_process_connection () #17 0x00007f004d6b9a47 in process_socket () from target:/etc/httpd/modules/mod_mpm_event.so #18 0x00007f004d6ba3ea in worker_thread () from target:/etc/httpd/modules/mod_mpm_event.so #19 0x00007f005899a1ca in start_thread () from target:/lib64/libpthread.so.0 #20 0x00007f0058402e73 in clone () from target:/lib64/libc.so.6 TOP output looks as follows top - 13:00:01 up 15 days, 7:09, 0 users, load average: 129.31, 100.24, 53.38 Tasks: 241 total, 7 running, 233 sleeping, 0 stopped, 1 zombie %Cpu(s): 95.6 us, 3.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.7 hi, 0.0 si, 0.0 st MiB Mem : 126914.0 total, 55179.2 free, 20765.8 used, 50969.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 104540.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 233319 apache 20 0 5818164 2.9g 13844 S 456.2 2.3 183:35.14 httpd 233318 apache 20 0 5751168 2.7g 13844 S 443.8 2.2 127:53.12 httpd 232848 apache 20 0 5226028 2.0g 13844 S 218.8 1.6 46:08.83 httpd 232849 apache 20 0 5291384 2.1g 13804 S 168.8 1.7 68:34.92 httpd 232847 apache 20 0 5291628 1.7g 13908 S 137.5 1.4 39:14.61 httpd And Apache server status clearly shows the number of “stuck” requests Apache Server Status for localhost (via ::1) Server Version: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k SVN/1.14.1 Server MPM: event Server Built: Aug 17 2023 10:52:25 ________________________________ Current Time: Monday, 18-Dec-2023 13:00:01 UTC Restart Time: Saturday, 16-Dec-2023 05:23:02 UTC Parent Server Config. Generation: 2 Parent Server MPM Generation: 1 Server uptime: 2 days 7 hours 36 minutes 58 seconds Server load: 129.31 100.24 53.38 Total accesses: 212201 - Total Traffic: 1.9 GB - Total Duration: 67348142 CPU Usage: u27647.8 s220.14 cu338.36 cs31.99 - 14.1% CPU load 1.06 requests/sec - 10.0 kB/second - 9.4 kB/request - 317.379 ms/request 112 requests currently being processed, 0 workers gracefully restarting, 138 idle workers Slot PID Stopping Connections Threads Async connections total accepting busy graceful idle writing keep-alive closing 0 232847 no 12 yes 11 0 39 0 0 1 1 232848 no 11 yes 11 0 39 0 0 0 3 232849 no 15 yes 15 0 35 0 0 0 5 233318 no 40 yes 37 0 13 0 0 2 6 233319 no 37 yes 38 0 12 0 0 0 Sum 5 0 115 112 0 138 0 0 3 W___W____W_____________WWW__W_______W__W__W_W_____W_W_____WW_W__ W__W____W____________W______WW______............................ ......................_W__W____W_W____WWW__WW_W____W___W_____WW_ _W______..................................................WWWW__ WWWWW_WWWW_WWW_WWWWWW_W_WWW__W_W_WWW__WWWWWWWWWWW__WWWW_WW_W_W_W WW__WW_WW_W_WWWWW_WWWWWWWWWWWW.................................. ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................ Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost Request 0-1 232847 0/20/27 W 1796.47 183 0 77128 0.0 0.25 0.28 10.X.X.239 http/1.1 xyz:443 PROPFIND /repo/.polarion/user-management/users/… 0-1 232847 0/23/32 W 1761.15 201 0 190003 0.0 0.18 0.23 10.X.X.239 http/1.1 xyz:443 GET /repo/!svn/bc/1731720/External/… 0-1 232847 0/24/35 W 1752.23 182 0 81788 0.0 0.31 0.36 10.X.X.239 http/1.1 xyz:443 GET /repo/!svn/bc/1743998/External/…