reassign 658222 dpkg thanks On Wed, Feb 01, 2012 at 10:43:44AM +0100, Raphael Hertzog wrote: > On Wed, 01 Feb 2012, Craig Sanders wrote: > > FYI, here's why dlocate is still useful. > > I don't claim that it's not useful and the bug report was not aiming to > get rid of dlocate... but to tell you that you should update your code > to use the proper interfaces and not to access internal files. > > So please don't reassign it to dpkg, that doesn't make sense.
it makes perfect sense. a useful function provided by dpkg for the last 15+ years is being taken away. that's a bug. If dpkg can no longer provide access to the raw files (and I still don't have a clear understanding of why that's going to be the case, or why multiarch is inevitably going to cause that) then it should provide equivalent functionality. the bug belongs to dpkg because I can't "fix" dlocate to work with the upcoming changes until dpkg provides this. and, as i said, if some other tool already provides that functionality (without being obnoxiously slow) then point me at it and i'll start using it. it's also kind of disturbing that those .list files aren't going to be there any more - does that mean that dpkg is abandoning the simplicity and robustness of plain text files for some binary db format? > > dlocate achieves this speed by concatenating the file listings to a > > single file in a nightly cron job, and running a simple grep on the > > result. A long time ago, it used to use frcode and locate, but it turns > > out that grepping a text file is much faster. > > So you can certainly concatenate the output of "dpkg -L <package>" > executed on all packages ? not exactly. the script that does the job uses perl readdir to loop through all the *.list files and (for each file) outputs each line prefixed by the package name and a ':'. this takes a grand total of 2.9 seconds on my system. neglible. I don't know for sure how long looping around 'dpkg -L' will take but it will be at least 25 minutes. on a fast system with an SSD. That's an unacceptable drop in performance. over 500x slower, and a significant enough load on the system that it's debatable whether it's even worth doing...save 5 or so seconds per query at the cost of thrashing the disks for almost half an hour every morning. not good (and makes me glad i never implemented a trigger to run the dlocate update after an apt-get upgrade as some people have requested). > For better performances you can give multiple parameters to "dpkg -L". not if i want a line output format that's useful to me, like "package:filename". since dpkg doesn't provide that format, I'd have to do it myself, by running dpkg -L on each package. > The tricky part (if you want to be compatible to multiarch) is to get > the list of all packages... > > dpkg-query -f'${binary:Package} ${Package}\n' -W|awk '{print $1}' should > do it. With a multiarch dpkg, ${binary:Package} will be non-empty and > thus used. For older dpkg, the usual ${Package} substitution will be used. > > > i'm more than happy to modify dlocate to work with whatever command-line > > options dpkg (or the various apt* tools) provides to give access to the > > package file lists. > > Why wouldn't "dpkg -L" be enough ? 1. output format 2. speed /etc/cron.daily/dlocate currently runs in just under 3 seconds. # time /etc/cron.daily/dlocate real 0m2.946s user 0m2.840s sys 0m0.204s also, 'dpkg -L' doesn't actually give me what I need. i need 'package:filename'. so i'd have to wrap that in some extra code for that output. which means i'd have to run an individual 'dpkg -L' for each package. just as a quick test, I tried looping through the output of dpkg-query as you suggested above, with: time dpkg-query -f'${binary:Package} ${Package}\n' -W|awk '{print $1}' | xargs -n 1 dpkg -L > /tmp/dpkg-L I got bored of waiting for it to finish after 15 minutes. at a guess it was probably about half way through the list (the list is sorted, and it was up to one of the linux-header packages). and, making it worse, /tmp is a tmpfs on my system, so the 'dpkg -L' output was being written to ramdisk while the /etc/cron.daily/dlocate output was being written to SSD. BTW, running the xargs without "-n 1" completes in about 2.2 seconds. # time dpkg-query -f'${binary:Package} ${Package}\n' -W|awk '{print $1}' | xargs dpkg -L > /tmp/dpkg-L.2 real 0m2.230s user 0m1.300s sys 0m0.964s that's good. perfect, in fact. but useless unless dpkg can be made to print "package:filename" rather than just "filename" when it's running -L for multiple args. so there's the fix. if dpkg -L (or dpkg-query -L, or anything else) can provide that output format in a reasonable time (seconds rather than minutes or 10s of minutes) then the problem goes away. which is why i'm reassigning this back to dpkg. craig ps: dpkg's lack of speed for 'dpkg -S' is the reason why dlocate was written in the first place. -- craig sanders <c...@taz.net.au> BOFH excuse #8: static buildup -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org