Hi SVN Community.

I’d like to consult a problem with occasional high CPU utilization by 
mod_authz_svn. Please, do CC me and my colleague in the response, because we 
are subscribed to the list.

Our Java application uses Subversion as a primary data storage and the SVN data 
are being accessed via Apache HTTPD mod_dav_svn and also by svnserve server on 
top of the same repository.

We are facing a problem in one of our productive environments hosted in AWS. 
The server is running RHEL 8.8, Apache 2.4.37 with SVN 1.14.1 binaries. 
Occasionally, the Apache HTTPD is acting strange, utilizing 100% CPU for 
periods of minutes to tens of minutes. This situation happens several times a 
week and causes delayed responses from the SVN server up to several minutes. 
Similar behavior is visible on several httpd server processes (mpm_event module 
is used), where each of them has tens of active requests being processed. When 
this CPU spike occurs, there several Apache HTTPD server processes with tens of 
threads busy in the following code related to the Subversion ACL.

The access file is quite big. It contains about 20000 group definitions with 
roughly 10000 users assigned to them and about 35000 authorization rules for 
15000 paths, and it’s size is over 5 MB. The access file is getting changed 100 
times a day, but the CPU spikes do not correlate with the CPU spikes.

Initially we suspected filesystem issues, but we were not able to confirm them. 
When the spike occurs, CPU I/O wait goes to zero and all the CPU is consumed in 
user space.

We are looking for suggestions how to further troubleshoot the problem, or 
possible confirmation that this is a Subversion bug or inefficiency.

Thank you,
Radek Krotil
Siemens Digital Industries Software


Pstack output example
Thread 127 (Thread 0x7f0003fcf700 (LWP 2934729)):

#0  0x00007f0049b79b60 in svn_authz.get_acl_access () from 
target:/lib64/libsvn_repos-1.so.0
#1  0x00007f0049b7a864 in update_user_rights () from 
target:/lib64/libsvn_repos-1.so.0
#2  0x00007f0048e35d06 in hash_do_callback () from 
target:/lib64/libsvn_subr-1.so.0
#3  0x00007f0058bc8ebe in apr_hash_do () from target:/lib64/libapr-1.so.0
#4  0x00007f0048e35d6a in svn_iter_apr_hash () from 
target:/lib64/libsvn_subr-1.so.0
#5  0x00007f0049b7a7b4 in expand_acl_callback () from 
target:/lib64/libsvn_repos-1.so.0
#6  0x00007f0048e35e41 in svn_iter_apr_array () from 
target:/lib64/libsvn_subr-1.so.0
#7  0x00007f0049b7c109 in svn_authz.parse () from 
target:/lib64/libsvn_repos-1.so.0
#8  0x00007f0049b78f74 in svn_repos_authz_read4 () from 
target:/lib64/libsvn_repos-1.so.0
#9  0x00007f004845e4a3 in get_access_conf () from 
target:/etc/httpd/modules/mod_authz_svn.so
#10 0x00007f004845e705 in req_check_access () from 
target:/etc/httpd/modules/mod_authz_svn.so
#11 0x00007f004845efd6 in auth_checker () from 
target:/etc/httpd/modules/mod_authz_svn.so
#12 0x0000561c8b0724c8 in ap_run_auth_checker ()
#13 0x0000561c8b074a9c in ap_process_request_internal ()
#14 0x0000561c8b093a70 in ap_process_async_request ()
#15 0x0000561c8b08fed0 in ap_process_http_connection ()
#16 0x0000561c8b0862b8 in ap_run_process_connection ()
#17 0x00007f004d6b9a47 in process_socket () from 
target:/etc/httpd/modules/mod_mpm_event.so
#18 0x00007f004d6ba3ea in worker_thread () from 
target:/etc/httpd/modules/mod_mpm_event.so
#19 0x00007f005899a1ca in start_thread () from target:/lib64/libpthread.so.0
#20 0x00007f0058402e73 in clone () from target:/lib64/libc.so.6


TOP output looks as follows

top - 13:00:01 up 15 days,  7:09,  0 users,  load average: 129.31, 100.24, 53.38
Tasks: 241 total,   7 running, 233 sleeping,   0 stopped,   1 zombie
%Cpu(s): 95.6 us,  3.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.7 hi,  0.0 si,  0.0 st
MiB Mem : 126914.0 total,  55179.2 free,  20765.8 used,  50969.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 104540.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 233319 apache    20   0 5818164   2.9g  13844 S 456.2   2.3 183:35.14 httpd
 233318 apache    20   0 5751168   2.7g  13844 S 443.8   2.2 127:53.12 httpd
 232848 apache    20   0 5226028   2.0g  13844 S 218.8   1.6  46:08.83 httpd
 232849 apache    20   0 5291384   2.1g  13804 S 168.8   1.7  68:34.92 httpd
 232847 apache    20   0 5291628   1.7g  13908 S 137.5   1.4  39:14.61 httpd

And Apache server status clearly shows the number of “stuck” requests
Apache Server Status for localhost (via ::1)
Server Version: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k 
SVN/1.14.1
Server MPM: event
Server Built: Aug 17 2023 10:52:25
________________________________
Current Time: Monday, 18-Dec-2023 13:00:01 UTC
Restart Time: Saturday, 16-Dec-2023 05:23:02 UTC
Parent Server Config. Generation: 2
Parent Server MPM Generation: 1
Server uptime: 2 days 7 hours 36 minutes 58 seconds
Server load: 129.31 100.24 53.38
Total accesses: 212201 - Total Traffic: 1.9 GB - Total Duration: 67348142
CPU Usage: u27647.8 s220.14 cu338.36 cs31.99 - 14.1% CPU load
1.06 requests/sec - 10.0 kB/second - 9.4 kB/request - 317.379 ms/request
112 requests currently being processed, 0 workers gracefully restarting, 138 
idle workers
Slot
PID
Stopping
Connections
Threads
Async connections
total
accepting
busy
graceful
idle
writing
keep-alive
closing
0
232847
no
12
yes
11
0
39
0
0
1
1
232848
no
11
yes
11
0
39
0
0
0
3
232849
no
15
yes
15
0
35
0
0
0
5
233318
no
40
yes
37
0
13
0
0
2
6
233319
no
37
yes
38
0
12
0
0
0
Sum
5
0
115

112
0
138
0
0
3
W___W____W_____________WWW__W_______W__W__W_W_____W_W_____WW_W__
W__W____W____________W______WW______............................
......................_W__W____W_W____WWW__WW_W____W___W_____WW_
_W______..................................................WWWW__
WWWWW_WWWW_WWW_WWWWWW_W_WWW__W_W_WWW__WWWWWWWWWWW__WWWW_WW_W_W_W
WW__WW_WW_W_WWWWW_WWWWWWWWWWWW..................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Srv
PID
Acc
M
CPU
SS
Req
Dur
Conn
Child
Slot
Client
Protocol
VHost
Request
0-1
232847
0/20/27
W
1796.47
183
0
77128
0.0
0.25
0.28
10.X.X.239
http/1.1
xyz:443
PROPFIND /repo/.polarion/user-management/users/…
0-1
232847
0/23/32
W
1761.15
201
0
190003
0.0
0.18
0.23
10.X.X.239
http/1.1
xyz:443
GET /repo/!svn/bc/1731720/External/…
0-1
232847
0/24/35
W
1752.23
182
0
81788
0.0
0.31
0.36
10.X.X.239
http/1.1
xyz:443
GET /repo/!svn/bc/1743998/External/…


Reply via email to