[
https://issues.apache.org/jira/browse/GUACAMOLE-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018319#comment-18018319
]
Niko commented on GUACAMOLE-2118:
---------------------------------
[~vnick]
Hi Nick,
we were able to capture two guacd processes in a hung state, one CPU-bound and
one idle, so you can see both scenarios side by side.
*PID 178388*
This process went into a hung state after the client disconnected.
Timeline excerpt:
* 10:55:22: RDP session started
* 11:01:48: User is not responding
* 11:26:47: RDP server closed/refused connection: Manually logged off
After that, the process stayed alive with one thread spinning at ~100% CPU.
{{lsof}} shows only local 127.0.0.1:4822 … CLOSE_WAIT sockets, no remote
ESTABLISHED connections.
{{top -H}} confirms the single worker thread consuming CPU.
Syslog snippet:
{code:java}
2025-09-04T10:55:22.821428+02:00 rgw guacd[178388]: Security mode: Negotiate
(ANY)
2025-09-04T10:55:22.821531+02:00 rgw guacd[178388]: Resize method: none
2025-09-04T10:55:22.821569+02:00 rgw guacd[178388]: No clipboard line-ending
normalization specified. Defaulting to preserving the format of all line
endings.
2025-09-04T10:55:22.821593+02:00 rgw guacd[178388]: User
"@e946d45c-d722-42fc-9dda-68ad454d6ba8" joined connection
"$89e2447a-8412-4249-b2f4-6cc94c1d1f35" (1 users now present)
2025-09-04T10:55:22.821626+02:00 rgw guacd[178388]: Local system reports 8
processor(s) are available.
2025-09-04T10:55:22.821659+02:00 rgw guacd[178388]: Graphical updates will be
encoded using 8 worker thread(s).
2025-09-04T10:55:22.824533+02:00 rgw guacd[178388]: Loading keymap "base"
2025-09-04T10:55:22.824614+02:00 rgw guacd[178388]: Loading keymap "base_altgr"
2025-09-04T10:55:22.824641+02:00 rgw guacd[178388]: Loading keymap
"de-de-qwertz"
2025-09-04T10:55:22.824658+02:00 rgw guacd[178388]: Ignoring requested color
depth of 24 bpp, as the RDP Graphics Pipeline requires 32 bpp.
2025-09-04T11:01:48.915030+02:00 rgw guacd[178388]: User is not responding.
2025-09-04T11:26:47.353867+02:00 rgw guacd[178388]: RDP server closed/refused
connection: Manually logged off.{code}
*PID 240326*
This process was also reported by the watchdog as hung. Unlike PID 178388, it
did not show high CPU usage, but it did accumulate a large number of
{{CLOSE_WAIT}} sockets.
>From the logs it appears to have cleanly disconnected after inactivity, yet
>the process itself did not exit.
Syslog snippet:
{code:java}
Sep 05 09:36:34 rgw guacd[240326]: User "@7a139d82-8fbc-4701-b71b-e3b46dc1f908"
joined connection "$be56469c-4627-4c67-9e7e-3ee081a2c062" (1 users now present)
Sep 05 09:36:34 rgw guacd[240326]: No known host keys provided, host identity
will not be verified.
Sep 05 09:36:38 rgw guacd[240326]: Unable to set the timezone: SSH server
refused to set "TZ" variable.
Sep 05 09:36:38 rgw guacd[240326]: No known host keys provided, host identity
will not be verified.
Sep 05 09:36:38 rgw guacd[240326]: SSH connection successful.
Sep 05 10:07:11 rgw guacd[240326]: User is not responding.
Sep 05 10:07:11 rgw guacd[240326]: User "@7a139d82-8fbc-4701-b71b-e3b46dc1f908"
disconnected (0 users remain)
Sep 05 10:07:11 rgw guacd[240326]: Last user of connection
"$be56469c-4627-4c67-9e7e-3ee081a2c062" disconnected
Sep 05 10:07:11 rgw guacd[240326]: SSH connection ended. {code}
*Artifacts attached for both PIDs:*
* {{guacd-178388-lsof.txt [^guacd-178388-lsof.txt]}}
* {{guacd-178388-gdb.txt [^guacd-178388-gdb.txt]}}
* {{guacd-240326-lsof.txt [^guacd-240326-lsof.txt]}}
* {{guacd-240326-gdb.txt [^guacd-240326-gdb.txt]}}
These include full {{lsof}} output, GDB backtraces ({{{}thread apply all bt
full{}}}), and context so you can compare the CPU-bound vs. idle hung state.
Let us know if you’d like a focused backtrace of the hottest thread only, or if
we should capture anything else next time.
Best regards,
Niko
> unable to upgrade 1.5.5 to 1.6.0 due to sporadic hanging issue
> --------------------------------------------------------------
>
> Key: GUACAMOLE-2118
> URL: https://issues.apache.org/jira/browse/GUACAMOLE-2118
> Project: Guacamole
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Jason Keltz
> Priority: Major
> Attachments: guacamole-logs.txt, guacd-178388-gdb.txt,
> guacd-178388-lsof.txt, guacd-240326-gdb.txt, guacd-240326-lsof.txt
>
>
> I've been running Guacamole since around 2020, upgrading reasonably quickly
> each and every time there's been an update. I update my Tomcat to the latest
> 9.X release from time to time (currently 9.0.102) , and my JDK to the latest
> 8.X release from time to time (currently jdk8u452-b09).
> Recently, after attempting an upgrade from Guacamole 1.5.5 to 1.6.0, I ran
> into a problem. Initially, everything seemed to work just fine. I can
> connect to any of the systems I have available. However, at some point
> later, I notice in the tomcat logs a lot of "connects" and "disconnects" to
> hosts. Users start complaining that "Guacamole isn't working". What I
> noticed at this point was that when they would try to return to a connection,
> it would connect, and their existing connecting would start to redraw, but
> it would hang in the middle. If I restart guacd at this point, it starts to
> work again, but the problem comes back. Some users would see it. Other users
> were fine.
> I feel like there's a bug hiding, and it may require a lot of user activity
> to get to it. I ended up creating a devel system for testing, and I'm
> running guac 1.6.0 there, and I've enabled full debugging, but I can't seem
> to make it happen here yet. Is there any easy way I can force a bunch of
> connections? The devel system is running labtest Rocky 8.10 (RHEL8.10) with
> latest kernel and patches and this matches the production system. They are
> both installed with the same kickstart configuration.
> I'm opening this "bug" even though I don't have concrete information yet. If
> I really have to do it, I may have to re-install on the production system to
> get the debugging information that I need, but I'd rather not do it if not
> necessary since it causes user inconvenience, and Guacamole is an important
> part of our educational environment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)