Bug#354925: apt-cacher: ...

Eduard Bloch Wed, 05 Apr 2006 00:08:39 -0700

#include <hallo.h>
* Daniel Richard G. [Tue, Apr 04 2006, 05:51:52PM]:


> > The problem is, the migration cannot be completely painless. Because
> > without tracking the origin of the packages apt-cacher will keep
> > delievering the wrong files, so it must to "learn" which mirrors or
> > download locations the existing files can be assigned to.
> 
> But wouldn't existing (and presumably non-broken) repositories contain 
> only packages with a single, common origin? That is, all Debian or all 

Nope. Apt-cacher has been very permissive since I can remember, and it
used to store files with the same basename in the same file. The premise
for this to work is the non-existence of namespace collisions - which is
no longer given with the rise of Debian's forks.

> Ubuntu? It's only when a repository attempts to serve up both that 
> multiple origins come into play, and by then one could argue that the 
> package-name intersections render the repository broken.
> 
> What I was thinking was that existing users with large, single-origin
> repositories would get a lazy/painless upgrade to the new
> package-storing convention. Users who want to serve up multi-origin
> repositories would have to start from scratch, using the forthcoming
> (yes? :) version of apt-cacher that supports it.

Hehe, no. As said, a simple upgrade script can discover where the
packages has been downloaded from, do "cat
/var/cache/apt-cacher/private/*.complete". While working on the last
major version, I just though about why the space for the empty .complete
files should be wasted for no reason, so I added some code to store the
complete URL of a successfully downloaded file ;-)

The current idea is simple: next version will have an upgrade script.
This script looks at cached DEBs and gets the URLs for them from the
mentioned .complete files. If some URLs are still missing (pre-sarge
downloads), the files are read and checksummed, then the URLs are
reconstructed from the Packages files by using the filenames, sizes and
checksums.
I think that should be enough to recover the needed data. Then, files
are moved to the next location.

> Did you also have in mind the group of users who are currently running
> multi-origin repositories, and are ignoring/overlooking/unaware of the
> ill effects?

Yes. That is why I try to hurry. And I beleive that the described
transition plan should work very smootly, I would even put it into
postinst script of the package. 

And I cannot afford making mistakes there. And while making the new
design, I faced some fundamental problems. 

<some developer comments following, for those who care>

First, I wanted to get rid of flocks. Forks cost IMO to much overhead
and I finally wanted to see real working HTTP pipelining, which is hard
to manage. I also needed to get rid of helper modules like LWP or
external tools (curl, wget) because they can hardly be controled in the
way needed here. Unfortunately I discovered some limitations of Perl -
Thread implementation is very "special" and it has been buggy
(segfaults!), data passing was not as comfortable as expected. And I
needed a database engine. Which would mean a lot of memory overhead, and
I still try to keep apt-cacher compatible with low-end proxy systems. So
getting rid of forks is not a bad idea anyway. I decided to make one
daemon which can be accessed via TCP or Unix Domain sockets, plus
wrappers to provide legacy functionality (CGI, stdin/stdout mode). The
wrapper solution should be even more efficient, after all. And I wanted
to get rid of flag files, as far as possible. Abusing filesystem for IPC
is not a fine way. Switching to internal thread communication solved
that problem too.

So, finally, I decided to take the bitter pill and rewrite it in C++. I
think the implementation is done to ~60 percent now, including partial
component tests. From the current POV I think this will be a
"double-threaded daemon", not single-threaded because I create an extra
thread dealing with database connections. The HTTP connections threads
and clients are created using internal statemachines.

I think I will use qdbm to store the mapping between physical paths and
the checksum+filesize+shortname that they provide. And per each
available host (to reduce the memory usage) a smaller qdbm database
caching contents of all known Packages/Sources/... files.

Eduard.

-- 
<asuffield> we should have a button on every computer marked "?", and
  connected to twenty pounds of semtex, and then let evolution take its
  course                                    // quote from #debian-devel


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#354925: apt-cacher: ...

Reply via email to