reassign 658222 dpkg
thanks

On Wed, Feb 01, 2012 at 10:43:44AM +0100, Raphael Hertzog wrote:
> On Wed, 01 Feb 2012, Craig Sanders wrote:
> > FYI, here's why dlocate is still useful.
> 
> I don't claim that it's not useful and the bug report was not aiming to
> get rid of dlocate... but to tell you that you should update your code
> to use the proper interfaces and not to access internal files.
> 
> So please don't reassign it to dpkg, that doesn't make sense.

it makes perfect sense. a useful function provided by dpkg for the
last 15+ years is being taken away. that's a bug.  If dpkg can no
longer provide access to the raw files (and I still don't have a clear
understanding of why that's going to be the case, or why multiarch
is inevitably going to cause that) then it should provide equivalent
functionality.

the bug belongs to dpkg because I can't "fix" dlocate to work with the
upcoming changes until dpkg provides this.

and, as i said, if some other tool already provides that functionality
(without being obnoxiously slow) then point me at it and i'll start
using it.


it's also kind of disturbing that those .list files aren't going to be
there any more - does that mean that dpkg is abandoning the simplicity
and robustness of plain text files for some binary db format?

> > dlocate achieves this speed by concatenating the file listings to a
> > single file in a nightly cron job, and running a simple grep on the
> > result.  A long time ago, it used to use frcode and locate, but it turns
> > out that grepping a text file is much faster.
> 
> So you can certainly concatenate the output of "dpkg -L <package>"
> executed on all packages ?

not exactly. the script that does the job uses perl readdir to loop
through all the *.list files and (for each file) outputs each line
prefixed by the package name and a ':'.

this takes a grand total of 2.9 seconds on my system.  neglible.

I don't know for sure how long looping around 'dpkg -L' will take but it
will be at least 25 minutes. on a fast system with an SSD.

That's an unacceptable drop in performance. over 500x slower, and a
significant enough load on the system that it's debatable whether
it's even worth doing...save 5 or so seconds per query at the cost of
thrashing the disks for almost half an hour every morning. not good (and
makes me glad i never implemented a trigger to run the dlocate update
after an apt-get upgrade as some people have requested).



> For better performances you can give multiple parameters to "dpkg -L".

not if i want a line output format that's useful to me, like
"package:filename". since dpkg doesn't provide that format, I'd have to
do it myself, by running dpkg -L on each package.

> The tricky part (if you want to be compatible to multiarch) is to get
> the list of all packages...
> 
> dpkg-query -f'${binary:Package} ${Package}\n' -W|awk '{print $1}' should
> do it. With a multiarch dpkg, ${binary:Package} will be non-empty and
> thus used. For older dpkg, the usual ${Package} substitution will be used.
> 
> > i'm more than happy to modify dlocate to work with whatever command-line
> > options dpkg (or the various apt* tools) provides to give access to the
> > package file lists.
> 
> Why wouldn't "dpkg -L" be enough ?

1. output format
2. speed


/etc/cron.daily/dlocate currently runs in just under 3 seconds.

# time /etc/cron.daily/dlocate

real 0m2.946s   user 0m2.840s   sys 0m0.204s


also, 'dpkg -L' doesn't actually give me what I need. i need
'package:filename'. so i'd have to wrap that in some extra code for that
output. which means i'd have to run an individual 'dpkg -L' for each
package.

just as a quick test, I tried looping through the output of dpkg-query
as you suggested above, with:

time dpkg-query -f'${binary:Package} ${Package}\n' -W|awk '{print $1}' | xargs 
-n 1 dpkg -L > /tmp/dpkg-L

I got bored of waiting for it to finish after 15 minutes. at a guess it
was probably about half way through the list (the list is sorted, and it
was up to one of the linux-header packages).

and, making it worse, /tmp is a tmpfs on my system, so the 'dpkg -L'
output was being written to ramdisk while the /etc/cron.daily/dlocate
output was being written to SSD.



BTW, running the xargs without "-n 1" completes in about 2.2 seconds.

# time dpkg-query -f'${binary:Package} ${Package}\n' -W|awk '{print $1}' | 
xargs dpkg -L > /tmp/dpkg-L.2

real 0m2.230s   user 0m1.300s   sys 0m0.964s

that's good. perfect, in fact. but useless unless dpkg can be made to
print "package:filename" rather than just "filename" when it's running
-L for multiple args.

so there's the fix. if dpkg -L (or dpkg-query -L, or anything else) can
provide that output format in a reasonable time (seconds rather than
minutes or 10s of minutes) then the problem goes away.

which is why i'm reassigning this back to dpkg.

craig

ps: dpkg's lack of speed for 'dpkg -S' is the reason why dlocate was
written in the first place.

-- 
craig sanders <c...@taz.net.au>

BOFH excuse #8:

static buildup



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to