#include <hallo.h> * Daniel Richard G. [Tue, Apr 04 2006, 05:51:52PM]:
> > The problem is, the migration cannot be completely painless. Because > > without tracking the origin of the packages apt-cacher will keep > > delievering the wrong files, so it must to "learn" which mirrors or > > download locations the existing files can be assigned to. > > But wouldn't existing (and presumably non-broken) repositories contain > only packages with a single, common origin? That is, all Debian or all Nope. Apt-cacher has been very permissive since I can remember, and it used to store files with the same basename in the same file. The premise for this to work is the non-existence of namespace collisions - which is no longer given with the rise of Debian's forks. > Ubuntu? It's only when a repository attempts to serve up both that > multiple origins come into play, and by then one could argue that the > package-name intersections render the repository broken. > > What I was thinking was that existing users with large, single-origin > repositories would get a lazy/painless upgrade to the new > package-storing convention. Users who want to serve up multi-origin > repositories would have to start from scratch, using the forthcoming > (yes? :) version of apt-cacher that supports it. Hehe, no. As said, a simple upgrade script can discover where the packages has been downloaded from, do "cat /var/cache/apt-cacher/private/*.complete". While working on the last major version, I just though about why the space for the empty .complete files should be wasted for no reason, so I added some code to store the complete URL of a successfully downloaded file ;-) The current idea is simple: next version will have an upgrade script. This script looks at cached DEBs and gets the URLs for them from the mentioned .complete files. If some URLs are still missing (pre-sarge downloads), the files are read and checksummed, then the URLs are reconstructed from the Packages files by using the filenames, sizes and checksums. I think that should be enough to recover the needed data. Then, files are moved to the next location. > Did you also have in mind the group of users who are currently running > multi-origin repositories, and are ignoring/overlooking/unaware of the > ill effects? Yes. That is why I try to hurry. And I beleive that the described transition plan should work very smootly, I would even put it into postinst script of the package. And I cannot afford making mistakes there. And while making the new design, I faced some fundamental problems. <some developer comments following, for those who care> First, I wanted to get rid of flocks. Forks cost IMO to much overhead and I finally wanted to see real working HTTP pipelining, which is hard to manage. I also needed to get rid of helper modules like LWP or external tools (curl, wget) because they can hardly be controled in the way needed here. Unfortunately I discovered some limitations of Perl - Thread implementation is very "special" and it has been buggy (segfaults!), data passing was not as comfortable as expected. And I needed a database engine. Which would mean a lot of memory overhead, and I still try to keep apt-cacher compatible with low-end proxy systems. So getting rid of forks is not a bad idea anyway. I decided to make one daemon which can be accessed via TCP or Unix Domain sockets, plus wrappers to provide legacy functionality (CGI, stdin/stdout mode). The wrapper solution should be even more efficient, after all. And I wanted to get rid of flag files, as far as possible. Abusing filesystem for IPC is not a fine way. Switching to internal thread communication solved that problem too. So, finally, I decided to take the bitter pill and rewrite it in C++. I think the implementation is done to ~60 percent now, including partial component tests. From the current POV I think this will be a "double-threaded daemon", not single-threaded because I create an extra thread dealing with database connections. The HTTP connections threads and clients are created using internal statemachines. I think I will use qdbm to store the mapping between physical paths and the checksum+filesize+shortname that they provide. And per each available host (to reduce the memory usage) a smaller qdbm database caching contents of all known Packages/Sources/... files. Eduard. -- <asuffield> we should have a button on every computer marked "?", and connected to twenty pounds of semtex, and then let evolution take its course // quote from #debian-devel -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]