Mark Hindley: > Thanks for this. > > The select->can_read() timeout of 0.00001 was the result of lots of > testing in bug #533830. Reducing the timeout may reduce your throughput > (which may not bother you ;)!) I am not sure why you see such high CPU > usage with 0.00001. What hardware is it running on? What is your > upstream bandwidth/connection? > > There are conflicting issues of CPU usage and throughput here, so I > think I might just make the value configurable as there may not be a > right value for everybody.
The CPU is a dual core, 2 threads per core, Intel Core i5 2.40GHz. I have done some tests to have a better global picture. - TEST 1: curl binary from shell command line, no proxy. $ curl -o xulrunner-10.0-dbg_10.0.4esr-2_amd64.deb 'ftp://ftp.de.debian.org/debian/pool/main/i/iceweasel/xulrunner-10.0-dbg_10.0.4esr-2_amd64.deb' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 156M 100 156M 0 0 536k 0 0:04:57 0:04:57 --:--:-- 621k Three minutes of CPU usage: 09:29:30 AM PID %usr %system %guest %CPU CPU Command 09:29:31 AM 4204 0.00 2.00 0.00 2.00 0 curl ... Average: 4204 0.38 1.21 0.00 1.59 - curl - TEST 2: aptitude download with original apt-cacher proxy. # aptitude download xulrunner-10.0-dbg Get: 1 http://ftp.de.debian.org/debian/ testing/main xulrunner-10.0-dbg amd64 10.0.4esr-2 [164 MB] Fetched 164 MB in 4min 30s (606 kB/s) It is the same file from the same mirror, it seems aptitude uses decimal MB. As before, three minutes: 09:39:51 AM PID %usr %system %guest %CPU CPU Command 09:39:52 AM 4489 84.00 5.00 0.00 89.00 2 /usr/sbin/apt-c ... Average: 4489 92.44 2.84 0.00 95.29 - /usr/sbin/apt-c - Test 3: same as test 2 but using patched apt-cacher with select->can_read() timeout of 0.01. aptitude download xulrunner-10.0-dbg Get: 1 http://ftp.de.debian.org/debian/ testing/main xulrunner-10.0-dbg amd64 10.0.4esr-2 [164 MB] Fetched 164 MB in 4min 22s (624 kB/s) Usual three minutes average: 09:53:00 AM PID %usr %system %guest %CPU CPU Command 09:53:01 AM 4728 1.00 0.00 0.00 1.00 0 /usr/sbin/apt-c ... Average: 4728 1.91 0.37 0.00 2.28 - /usr/sbin/apt-c - Wonderful emacs org-mode table :-) for a summary of average values: | Program | time (s) | speed (Mbit/s) | cpu % | |-----------------------+----------+----------------+-------| | Curl binary, no proxy | 297 | 4.407 | 1.59 | | apt-cacher, standard | 270 | 4.848 | 95.29 | | apt-cacher, 0.01 | 262 | 4.996 | 2.28 | |-----------------------+----------+----------------+-------| > I wonder if the timeout of 0.00001 is just too small for your > system. What version of perl and IO::Select do you have? perl 5.14.2-9 perl-base 5.14.2-9 > strace on the libcurl process on my system shows: > > select(8, [0], NULL, NULL, {0, 10}) = 0 (Timeout) > poll([{fd=2, events=POLLIN|POLLPRI}], 1, 0) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 68106232}) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 68133052}) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 68160152}) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 68186692}) = 0 > select(8, [0], NULL, NULL, {0, 10}) = 0 (Timeout) > poll([{fd=2, events=POLLIN|POLLPRI}], 1, 0) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 69133499}) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 69160599}) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 69189654}) = 0 > clock_gettime(CLOCK_MONOTONIC, {567815, 69216754}) = 0 > select(8, [0], NULL, NULL, {0, 10}) = 0 (Timeout) > > The obvious significant difference from your strace are the calls to > clock_gettime. It looks to me as if select() on your system is not > observing the 10us timeout, but I don't know why. Can you test what is > the smallest timeout value that is observed? I have now a 35MB complete trace for the full running time and I can affirm that it contains not even one call to clock_gettime. The first 200 lines are in the attached file , the rest is very similar except at the end, after the download is finished. The developer manpages for system calls also contain a tutorial for select(), the page is called 'select_tut'. Not wishing to criticize anyone I quote from it: Many people who try to use select() come across behavior that is difficult to understand and produces nonportable or borderline results. It is easy to introduce subtle errors that will remove the advantage of using select(), so here is a list of essentials to watch for when using select(). 1. You should always try to use select() without a timeout. Your program should have nothing to do if there is no data available. Code that depends on timeouts is not usually portable and is difficult to debug. [...] At least we know that it is difficult to debug :-). Best regards, alfredo
trace1.head.gz
Description: application/gunzip