[Bug 2049315] Re: cups-browsed running non-stop on two cores

Till Kamppeter Thu, 06 Feb 2025 10:36:11 -0800

Sorry for posting a test case only focusing on the busy-loop bug going
away. Important is also that cups-browsed is still doing its original
functionality  of automatically creating print queues for network
printers (IPP printers, remote CUPS queues, Printer Applications)
correctly.

At least nobody commented here that their printing ceased to work after
the update.

I checked the functionality on my Oracular system with my printers, but
it is also checked, for both Oracular and Noble, by it passing its
autopkgtest (otherwise the package had not made it into -proposed).

My autopkgtest is the script test/run-tests.sh in the upstream source of
cups-browsed.

While cups-browsed and also cupsd is running this script

- creates two software emulations of driverless IPP printers (Printer
Applications) with unique names, and
printing jobs into a file
- waits until cups-browsed has auto-created CUPS queues for them, checking the
presence of the queues with
"lpstat -v"
- sends a print job to one of them
- waits until the job appears in the CUPS queue and disappears from it
- checks the presence of the print job output file
- kills the Printer Applications
- checks whether cups-browsed removes the queues
- shuts down cups-browsed

This test can be easily done by anyone of you, no printer required.
Install it via

sudo apt install cups-browsed-tests

Then run (no root or sudo required):

run-tests.sh

Answer the questions with "3" (use system's cups-browsed) and "N" (do
not use Valgrind).

The above-mentioned test sequence is done, verbosely telling on the
screen what is happening. If all got done correctly, the exit status is
0.

Run

echo $?

to check the exit status.

** Description changed:

[ Impact ]

During the past months it often happened that a user observed that cups-
browsed takes 100% or 200% (2 cores) of CPU and ceases to do its actual
work. This even happens for users who do not print anything, just
printers shared by other computers in the local network, triggering
cups-browsed to make them available on the local machine, can make cups-
browsed getting stuck hogging 1 or 2 CPU cores. To free the CPU cores
and make cups-browsed working again one needs to kill and restart it.

The bug is not easily reproducible. It only occurs sporadically.
Restarting a stuck cups-browsed will not end up getting it stuck
immediately again.

It is annoying for the user that suddenly a significant part of their
CPU power gets hogged and the machine's fan producing noise. Especially
most users do not know the root cause of it and how to stop it.

The problem is caused by concurrent use of one global HTTP connection to
CUPS by several sub-threads. This corrupts the data structure which
makes the httpGets() function of CUPS fall into an infinite loop. The
proposed packages (for Noble and Oracular) contain a backport of the fix
of the upstream version 2.1.1 (already uploaded to Plucky). This fix
lets each function create its own HTTP connection to CUPS instead of
using one single global one. None of these is used by multiple threads
and therefore the problem should go away.

The fix I did solely do by taking a few backtraces (by reporters of the
bug and one where I observed the bug by myself) to locate it (gets
always stuck in httpGets() of libcups), and reviewing the HTTP-related
code of libcups and of cups-browsed, discovering the described problem
and remedying it as described. I did not do any before/after testing. I
only based my self on my observations and code reviews.

What happens in httpGets() is described in my comments here (especially
near the end):

https://github.com/OpenPrinting/cups/issues/879

[ Test Plan ]

UPDATE

Please everybody do this test, especially users of Oracular (24.10) as
we are still needing verification there.

Paste the following script into a file

```
#!/bin/sh

while true; do
service cups-browsed restart
printf .
sleep 15s
done
```

and make the file executable. Then execute the file and leave it running
for some hours.

Without the update applied, cups-browsed will sooner or later get stuck
with 100% CPU. Once this happens, it will require a SIGKILL signal
("kill -9") to be stopped. As the "service cups-browsed restart" command
only sends SIGTERM, the stuck cups-browsed will keep running on 100% CPU
while the script is spinning and trying to restart cups-browsed every 15
sec. This way you will see the failure whenever you come back and check,
no need to be present through the whole process.

Now stop the script with Ctrl+C and do

```
killall -9 cups-browsed
```

After that update to package with the fix proposed here.

Now run the script again and leave it running for some hours. cups-
browsed should not get stuck.

Thanks a lot, Jeffrey Knockel (jeff250), for providing this testing
method (comment #40).

UPDATE 2

Unfortunately, a failure of cups-browsed does not stop the script, cups-
browsed gets killed after a timeout of 90 seconds.

So to capture failures I ran the following command from another
terminal:

while true; do sleep 2; ps aux | grep /usr/sbin/cups-browsed | grep -v
grep; done | tee log.txt

This produces a line like

cups-br+ 949545 0.2 0.0 817024 20968 ? Ssl 08:58 0:00 /usr/sbin/cups-
browsed

every 2 seconds.

With the ¨0:00" right before "/usr/sbin/cups-browsed" being the
accumulated CPU time of the process. Not getting stuck, cups-browsed
never accumulates visible CPU time when running only 15 seconds, but
hanging in a busy loop for op to 90 seconds the CPU time gets visible.

grep -v ' 0:00 /usr/sbin/cups-browsed' log.txt

easily reveals the fact.
+
+ UPDATE 3
+
+ To test whether cupd-browsed's original functionality did not get broken
+ by the update see comment #45. Especially the fact that the update has
+ passed its autopkgtest on both Oracular and Noble is an evidence that
+ cups-browsed is still doing what it was designed for.

ORIGINAL TEXT

Due to the problem only occuring sporadically it is not easy to make a
test, install the proposed, fixed version, do the same test again and
see that the problem has disappeared.

But if somebody of you observes the bug with a certain frequency, like
in 1 of 10 attempts for example, you could try until getting the bug
with the old version, update, and then try again, if you reach a
reasonably high number of tests without the bug occuring again, you
could consider it as fixed.

The bug requires cups-browsed to create or remove local CUPS queues for
remote printers, so that it interacts with the local CUPS, which it does
by IPP, using libcups' HTTP API. This requires the appearing and
disappearing of network printers, emulations of them with tools like
ippeveprinter, or shared remote CUPS queues. Also disruption in the
network connection between a remote server (printer, CUPS) and the
client, like shutting down network connection or suspending the machine
could cause the problem.

A possible situation where it happened but we have no proof was on a
Canonical Sprint (where all of Canonical's engineers meet physically). I
have some print queues on my laptop which are shared (and so other
people could see them in their print dialogs) and during the event I
often had to get from one room to another and for that I closed my
laptop, it suspended, and went to the other room where I opened again.
Other people on the event observed the bug. I already tried to cause it
by myself, suspending a laptop which shares printers and observing cups-
browsed on another laptop but I was not able to reproduce it. Probably
the Sprint with a big network and many people is a different situation.

So unfortunately I am not able to force the occurrence of this bug.

A possible way could be brute-forcing with many printers, writing a
script starting 100s of instances of ippeveprinter or so.

To test cups-browsed without having a printer one can use cups-browsed's
own test script, test/run-tests.sh in the source code of cups-browsed.

The test script, applying to the installed cups-browsed can be run most
easily as follows:

$ sudo apt install cups-browsed-tests
$ mkdir test
$ cd test
$ cp /usr/share/cups-browsed-tests/* .
$ /usr/bin/run-tests.sh 3 no

The script creates 2 printers with ippeveprinter, checks whether cups-
browsed creates queues for them, printes on one of them, checks
completion of the job, then stops the ippeveprinter instances one by one
and checks whether cups-browsed removes the queues. This is the
autopkgtest of cups-browsed. Running it manually as described I was not
able to trigger this bug, but modifying it to run 100 instances of
ippeveprinter or letting it do "kill -9" on ippeveprinter instances
could perhaps cause the bug.

The script also serves as regression test and on cups-browsed 2.1.1
(which contains the fix) it works as described here. Also "make check"
(uses this script, too) on the Noble and Oracular packages proposed here
passes and so does not reveal any regressions.

So I am also asking any of the reporters of these bugs, whether they
observe the bug with enough frequency or know how to force the bug to
occur (please tell, how, then), to check whether the proposed packages
make the bug not appearing any more and tell their results here.

[ Where problems could occur ]

The patch is rather long and I have done only basic tests to check
whether cups-browsed is still working as designed. So there is a
regression potential. So I also ask everybody reading this, including
those who did not observe or are not able to reproduce the bug reported
here to test whether the fixed cups-browsed is still working as they
expected, or if there is some regression.

[ Original description ]

After waking up from standby cups-browsed runs incessantly on two cores:

18243 cups-br+ 20 0 432256 26348 17848 R 99.7 0.2 66:54.73
cups-br+
85147 cups-br+ 20 0 432256 26348 17848 R 99.7 0.2 66:52.08
cups-br+

cups-br+ 18243 18.9 0.1 432256 26348 ? Rsl 08:30 135:06
/usr/sbin/cups-browsed

Best regards

Heinrich

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: cups-browsed 2.0.0-0ubuntu2
ProcVersionSignature: Ubuntu 6.6.0-14.14-generic 6.6.3
Uname: Linux 6.6.0-14-generic x86_64
ApportVersion: 2.27.0-0ubuntu6
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: KDE
Date: Sun Jan 14 20:19:22 2024
InstallationDate: Installed on 2021-07-01 (927 days ago)
InstallationMedia: Kubuntu 21.04 "Hirsute Hippo" - Release amd64 (20210420)
MachineType: {report['dmi.sys.vendor']} {report['dmi.product.name']}
Papersize: a4
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.6.0-14-generic
root=/dev/mapper/vgubuntu-root ro
SourcePackage: cups-browsed
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/08/2023
dmi.bios.release: 1.63
dmi.bios.vendor: LENOVO
dmi.bios.version: R0UET83W (1.63 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20KV0008GE
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.63
dmi.modalias:
dmi:bvnLENOVO:bvrR0UET83W(1.63):bd02/08/2023:br1.63:efr1.63:svnLENOVO:pn20KV0008GE:pvrThinkPadE585:rvnLENOVO:rn20KV0008GE:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20KV_BU_Think_FM_ThinkPadE585:
dmi.product.family: ThinkPad E585
dmi.product.name: 20KV0008GE
dmi.product.sku: LENOVO_MT_20KV_BU_Think_FM_ThinkPad E585
dmi.product.version: ThinkPad E585
dmi.sys.vendor: LENOVO

--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2049315

Title:
cups-browsed running non-stop on two cores

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cups-browsed/+bug/2049315/+subscriptions

--
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2049315] Re: cups-browsed running non-stop on two cores

Reply via email to