Steve Dunham writes ("Re: dpkg memory usage"): > John Goerzen <[EMAIL PROTECTED]> writes: > > I was upgrading packages on my 64 meg system today ant noticed: > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND > > 24785 root 18 0 12680 12M 568 S 0 0.1 20.0 5:36 dpkg > > Yes, that's almost 13 megs used by dpkg, and 20% of my RAM. > > That also is 4 megs more than the TOTAL amount of RAM in some computers I > > work with. > > > So...why must dpkg use almost as much memory as XFree86 itself, and MORE > > than Netscape does at times?
The main reason that dpkg is so large is because it has loaded into core a complete list of all files it has installed on your system and which package(s) they came from. This data really is that large. When processing, dpkg does tend to grow somewhat, and particularly to use up swap rather than real memory. This is because it has very malloc-intensive data structures and so internally uses a special version of malloc which cannot free (a simple incrementing allocator working from blocks of ordinarily malloc'd memory). The amount of memory used is proportional to the number of files processed, and even for a complete reinstall in one dpkg run it won't use more than about twice the amount for a `dpkg --search' - and most of the data will be swapped out. On `small memory' systems dpkg switches to a different data structure which is about twice as slow for general access on a big machine, but has a much smaller working set so is much faster for setup and access on small machines. dpkg uses sysinfo(2) to guess which algorithm to use, and you can force one or the other using command line options. I have checked this on a 3Mb system and it worked as expected. > > Not only that, but it is hideously slow even on current computers. My > > suggestion: store the databases in a DBM format of some sort instead of > > plain text. The reason dpkg is slow is _not_ mainly because of the database format it uses. It's mainly because the access method you're using is (I surmise) reinvoking dpkg each time. That involves loading the more robust data structures in /var/lib/dpkg/info into a fast-to-access in-core format. Unfortunately dpkg's current calling interface makes it hard not to do this, but I'm going to fix that at some point. I also intend to change the format of the /var/lib/dpkg/info/*.list database to make it faster to load, and I may change /var/lib/dpkg/status too. (The resulting structures will still be editable with emacs.) > IMHO, dpkg should be using a DBM database for file -> package lookups > and perhaps for the "status" and "available" caches too. (I believe > apt does something like this for "available".) > > (I presume that dpkg actually does use hash tables internally, but it > recalculates that 12MB of data everytime it starts up, which, IMHO, is > not very efficient.) It's only inefficient if you start up dpkg a lot. Using a dbm file or something is fine if you just want read-only access. However, they're no good for updating, because such systems do not have sensible behaviour on filesystem failures like disk full - they can't be updated atomically. You end up having to read everything in and rewrite the whole database after every update. > The startup time and memory usage is just not worth any benefits > gained from using a few thousand text files. > > And the text version is still prone to severe corruption. Mine was > scrambled the other day when I upgraded the modutils package running a > 2.1.x kernel - the machine locked up, and when I rebooted and tried to > install more packages, dpkg mixed up a bunch of scripts and .list > files. I think this was probably a simple kernel bug. dpkg cannot defend against your kernel scrambling its filesystem data structures. It does ask the kernel to confirm that changes have been committed to disk before it continues. Here is the relevant code from dpkg: file= fopen(newvb.buf,"w+"); if (!file) ohshite(...); push_cleanup(cu_closefile,ehflag_bombout, 0,0, 1,(void*)file); while (list) { if (!(leaveout && (list->namenode->flags & fnnf_elide_other_lists))) { fputs(list->namenode->name,file); putc('\n',file); } list= list->next; } if (ferror(file)) ohshite(...); if (fflush(file)) ohshite(...); if (fsync(fileno(file))) ohshite(...); pop_cleanup(ehflag_normaltidy); /* file= fopen() */ if (fclose(file)) ohshite(...); if (rename(newvb.buf,vb.buf)) ohshite(...); ohshite is a nonreturning error handling function. I've elided its arguments for brevity. As you can see, it is careful to flush and sync the .list file before it uses rename(2) to atomically overwrite the destination file. Ian. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]