[ 
https://issues.apache.org/jira/browse/GUACAMOLE-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018319#comment-18018319
 ] 

Niko commented on GUACAMOLE-2118:
---------------------------------

[~vnick] 

Hi Nick,

we were able to capture two guacd processes in a hung state, one CPU-bound and 
one idle, so you can see both scenarios side by side.

*PID 178388*
This process went into a hung state after the client disconnected.
Timeline excerpt:
 * 10:55:22: RDP session started

 * 11:01:48: User is not responding

 * 11:26:47: RDP server closed/refused connection: Manually logged off

After that, the process stayed alive with one thread spinning at ~100% CPU.
{{lsof}} shows only local 127.0.0.1:4822 … CLOSE_WAIT sockets, no remote 
ESTABLISHED connections.
{{top -H}} confirms the single worker thread consuming CPU.

Syslog snippet:
{code:java}
2025-09-04T10:55:22.821428+02:00 rgw guacd[178388]: Security mode: Negotiate 
(ANY)
2025-09-04T10:55:22.821531+02:00 rgw guacd[178388]: Resize method: none
2025-09-04T10:55:22.821569+02:00 rgw guacd[178388]: No clipboard line-ending 
normalization specified. Defaulting to preserving the format of all line 
endings.
2025-09-04T10:55:22.821593+02:00 rgw guacd[178388]: User 
"@e946d45c-d722-42fc-9dda-68ad454d6ba8" joined connection 
"$89e2447a-8412-4249-b2f4-6cc94c1d1f35" (1 users now present)
2025-09-04T10:55:22.821626+02:00 rgw guacd[178388]: Local system reports 8 
processor(s) are available.
2025-09-04T10:55:22.821659+02:00 rgw guacd[178388]: Graphical updates will be 
encoded using 8 worker thread(s).
2025-09-04T10:55:22.824533+02:00 rgw guacd[178388]: Loading keymap "base"
2025-09-04T10:55:22.824614+02:00 rgw guacd[178388]: Loading keymap "base_altgr"
2025-09-04T10:55:22.824641+02:00 rgw guacd[178388]: Loading keymap 
"de-de-qwertz"
2025-09-04T10:55:22.824658+02:00 rgw guacd[178388]: Ignoring requested color 
depth of 24 bpp, as the RDP Graphics Pipeline requires 32 bpp.
2025-09-04T11:01:48.915030+02:00 rgw guacd[178388]: User is not responding.
2025-09-04T11:26:47.353867+02:00 rgw guacd[178388]: RDP server closed/refused 
connection: Manually logged off.{code}
*PID 240326*
This process was also reported by the watchdog as hung. Unlike PID 178388, it 
did not show high CPU usage, but it did accumulate a large number of 
{{CLOSE_WAIT}} sockets.
>From the logs it appears to have cleanly disconnected after inactivity, yet 
>the process itself did not exit.

Syslog snippet:
{code:java}
Sep 05 09:36:34 rgw guacd[240326]: User "@7a139d82-8fbc-4701-b71b-e3b46dc1f908" 
joined connection "$be56469c-4627-4c67-9e7e-3ee081a2c062" (1 users now present)
Sep 05 09:36:34 rgw guacd[240326]: No known host keys provided, host identity 
will not be verified.
Sep 05 09:36:38 rgw guacd[240326]: Unable to set the timezone: SSH server 
refused to set "TZ" variable.
Sep 05 09:36:38 rgw guacd[240326]: No known host keys provided, host identity 
will not be verified.
Sep 05 09:36:38 rgw guacd[240326]: SSH connection successful.
Sep 05 10:07:11 rgw guacd[240326]: User is not responding.
Sep 05 10:07:11 rgw guacd[240326]: User "@7a139d82-8fbc-4701-b71b-e3b46dc1f908" 
disconnected (0 users remain)
Sep 05 10:07:11 rgw guacd[240326]: Last user of connection 
"$be56469c-4627-4c67-9e7e-3ee081a2c062" disconnected
Sep 05 10:07:11 rgw guacd[240326]: SSH connection ended. {code}
*Artifacts attached for both PIDs:*
 * {{guacd-178388-lsof.txt [^guacd-178388-lsof.txt]}}

 * {{guacd-178388-gdb.txt [^guacd-178388-gdb.txt]}}

 * {{guacd-240326-lsof.txt [^guacd-240326-lsof.txt]}}

 * {{guacd-240326-gdb.txt [^guacd-240326-gdb.txt]}}

These include full {{lsof}} output, GDB backtraces ({{{}thread apply all bt 
full{}}}), and context so you can compare the CPU-bound vs. idle hung state.

Let us know if you’d like a focused backtrace of the hottest thread only, or if 
we should capture anything else next time.

Best regards,
Niko

 

 

 

 

> unable to upgrade 1.5.5 to 1.6.0 due to sporadic hanging issue
> --------------------------------------------------------------
>
>                 Key: GUACAMOLE-2118
>                 URL: https://issues.apache.org/jira/browse/GUACAMOLE-2118
>             Project: Guacamole
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Jason Keltz
>            Priority: Major
>         Attachments: guacamole-logs.txt, guacd-178388-gdb.txt, 
> guacd-178388-lsof.txt, guacd-240326-gdb.txt, guacd-240326-lsof.txt
>
>
> I've been running Guacamole since around 2020, upgrading reasonably quickly 
> each and every time there's been an update.  I update my Tomcat to the latest 
> 9.X release from time to time (currently 9.0.102) , and my JDK to the latest 
> 8.X release from time to time (currently  jdk8u452-b09).  
> Recently, after attempting an upgrade from Guacamole 1.5.5 to 1.6.0, I ran 
> into a problem.  Initially, everything seemed to work just fine.  I can 
> connect to any of the systems I have available.  However, at some point 
> later, I notice in the tomcat logs a lot of "connects" and "disconnects" to 
> hosts.  Users start complaining that "Guacamole isn't working".  What I 
> noticed at this point was that when they would try to return to a connection, 
>  it would connect, and their existing connecting would start to redraw, but 
> it would hang in the middle.  If I restart guacd at this point, it starts to 
> work again, but the problem comes back. Some users would see it.  Other users 
> were fine.  
> I feel like there's a bug hiding, and it may require a lot of user activity 
> to get to it.  I ended up creating a devel system for testing, and I'm 
> running guac 1.6.0 there, and I've enabled full debugging, but I can't seem 
> to make it happen here yet.  Is there any easy way I can force a bunch of 
> connections?  The devel system is running labtest Rocky 8.10 (RHEL8.10) with 
> latest kernel and patches and this matches the production system.  They are 
> both installed with the same kickstart configuration.    
> I'm opening this "bug" even though I don't have concrete information yet.  If 
> I really have to do it, I may have to re-install on the production system to 
> get the debugging information that I need, but I'd rather not do it if not 
> necessary since it causes user inconvenience, and Guacamole is an important 
> part of our educational environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to