On Mon, May 29, 2006 at 11:19:00AM +1000, Tim Connors <[EMAIL PROTECTED]> was 
heard to say:
> /var/lib/dpkg/info is of course irrelevant for this bug -- I meant 
> to pick /var/lib/apt/lists, who's disk usage is 182MB.
> 
> Now I recall that all of this is mapped in -- does aptitude truly need all 
> of this data in RAM at once, and is this what is causing the 
> large memory allocation and hence thrashing -- seems about the right size?
> 
> Why would we map it in all at once -- much of it is rather duplicated 
> information.  Who cares if this text appears identically in 
> ftp.debian.org_dists_testing_main_binary-i386_Packages or 
> ftp.debian.org_dists_unstable_main_binary-i386_Packages or 
> mirror.aarnet.edu.au_pub_debian_dists_unstable_main_binary-i386_Packages

  Well, aptitude doesn't know that they're the same; as far as it knows,
there are three different versions of each package.

  There are things that could be done to adjust the storage of the
descriptions list, of course.  For instance, I wonder if a suffix tree
or some similar data structure would be helpful.  I don't really want
to rewrite the whole apt caching layer, though.

  Probably the reason aptitude pulls everything into RAM is that it
walks over the whole cache to build some of its structures on startup.
I suppose we could get some speedup by using a binary cache of those
too.  For instance, the debtags stuff doesn't change unless the lists
change, so there's no need to build it every time.  Also, currently
std::sets are used; probably we could just use those to build it the
first time, but actually store it as a sorted list.  Other optimizations
are also possible (e.g., indexing into a common list of tag strings),
depending on how much time you want to waste on them.

  Daniel

Attachment: signature.asc
Description: Digital signature

Reply via email to