Package: apt-cacher Version: 1.6.8 This bug report is a spinoff of Bug#517761.
In the course of investigating that bug, I decided to set up a "simulated Debian repository" so that I could stress-test apt-cacher's handling of "apt-get update" without real Debian repository mirrors seeing the load. When I initiated "apt-get update" against apt-cacher against this simulated repository, however, I saw this: ----8<---- # apt-get update Get:1 http://ossia.in.teragram.com lenny Release.gpg Err http://ossia.in.teragram.com lenny Release.gpg Connection timed out Ign http://ossia.in.teragram.com lenny/main Translation-en_US Err http://ossia.in.teragram.com lenny/non-free Translation-en_US Connection timed out Ign http://ossia.in.teragram.com lenny/contrib Translation-en_US Err http://ossia.in.teragram.com lenny/updates Release.gpg Connection timed out [...] ---->8---- (The "Connection timed out" error messages were preceded by e.g. "50% [1 Release.gpg 6653]", which would sit there unmoving for two minutes.) After some investigation using strace(1) on apt-get, I found out what was going on. I'm using thttpd as the Web server for the "simulated repository." Apparently, it doesn't support HTTP Keep-Alive (let alone pipelining); it only returns one file at a time, always with "Connection: close". And here's the thing: It returns a Content-Length header for successful requests, but *not* for 404 error pages. apt-cacher does support Keep-Alive and pipelining, and returns the requested files this way regardless of the upstream server's limitations. But when it returns the 404 for Translation-en_US.bz2, it too leaves out the Content-Length header. And this screws up the pipelining--- apt-get doesn't know where the 404 body ends, and the headers for the next requested file begin. In the strace log, I'm seeing apt-get schlurp in 404-body1 + 404-header2 + 404-body2 + 200-header3 + 200-body3 as though that were all 404-body1. Which is why the status messages show e.g. 6kB+ downloaded for Release.gpg, even though that file is barely more than 1kB. (apt-get stops reading after 200-body3 presumably because it parsed the Content-Length header in 200-header3.) The fix, then, would be to ensure that a Content-Length header is always present, regardless of what the upstream server returns. (I've confirmed that this can affect normally-retrieved files, too---try zapping all the Content-Length headers in CACHEDIR/headers/*, and watch apt-get become hopelessly confused.) The way the code is written, however, doesn't make this very straightforward: * You could add the header [if not already present] when the file is served up in return_file(), except that $n/$curlen is calculated well after the header is already sent. You could stat $cached_file before that point, but... eww, gratuitous redundancy. * You could add the header at the time the file is downloaded, in libcurl(). Except that, again, the header is written out before you know the size of the file. Note: Dismayingly enough, setting Acquire::http::Pipeline-Depth=0 doesn't work around the problem. apt-get requests, and apt-cacher returns a single file at a time, but apt-get then select()s on the socket (why??) until timing out two minutes later. This happens after the very first file it tries to download (Release.gpg). (I think this may be because apt-cacher returned the file with "Connection: Keep-Alive", but I don't get why apt-get isn't just closing the connection after receiving the file and opening a new one.) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org