Hi!

On Thu, 2023-05-04 at 06:16:52 +0200, Marc Lehmann wrote:
> Package: dpkg
> Version: 1.20.12
> Severity: wishlist

> on many of my systems using nvme drives or even sata ssds, the
> FS_IOC_FIEMAP scan that dpkg does is not only superfluous, but with a
> large number of packages installed, absolutely dominates the installation
> time (based on profiling) - essentially, dpkg does nothing else during the
> preparing/unpacxking/selecting steps.
> 
> and while being a very useful optimisation for spinning rust drives, when
> installing many small packages, the scan is apparently done for every
> package unpack, when, again, on modern systems, everything is in the cache
> aftzer the first pass.

Hmm, right, whenever we update and write the files list file on disk
we invalidate the in-core lists and force a reload for that package,
so the scan might indeed be performed for at least each unpacked
package when processing each subsequent one.

The other potential cause for the entire database being loaded for
each package you might also be seeing is because apt tends to split
essential package installation into individual runs to reduce
potential breakage (AFAIR).

> could you consider making this scan optional (i.e. off-switchable via an
> option)? this way, users can disable it in dpkg.cfg on those systems where
> it matters, while still retaining it by default (I do not think trying to
> autodetect ssds is a good idea here, and a switch should be trivial to
> implement).

Yes, AFAIR at the time when implementing this, autodetection was
considered, but was quickly discarded as unfeasible/unportable and
error-prone, and we opted for performing this unconditionally.

I've now locally implemented this via a new force option enabled by
default, that I'm tentatively calling --force-mechanical-load, but I'll
ponder about the name because it's the first thing that came to mind,
and I'm not fully convinced. If you happen to have proposals I'm all
ears. :)

But looking further, the scan is not ignoring packages that already
have a valid loaded files list file (although it should already be
caching the physical offset if it was found and then skip the scan),
so I'll skip those, which might also help by substantially reducing
the work being done on both HDD and NVME/SSD.

I might also look into avoiding the scan completely if there are only
few files list files to reload (instead of say the entire lot), but
I'm not sure how much that will help once/if it is only done for the
few packages that need a reload (if the current caching is not effective
for some reason, then I'd assume it might not help a lot?). And thinking
about it perhaps even the force option is not really necessary after
the scan fix anyway? (But it would still help in the apt splitting case
mentioned above though.)

I'm attaching the change to skip unnecessary optimization scan, in
case you want to give it a go, although I've only compile tested it
for now.

> PS: it might also be a good idea to reconsider the whole scan - fiemap
> often creates just as many seeks as a stat or an open, especially on more
> complicated filesystems. In fact, sorting by inode numbers (which are
> essentially free info) is often a good approximation for locality on many
> filesystems and might be a good replacement strategy.

Perhaps, I would need to check the context from when it was added as
I cannot recall whether this option was considered and discarded,
and otherwise it would need to be benchmarked on a bunch of different
filesystems and types of disks to see whether there would be
significant performance degradation/regression.

Thanks,
Guillem
From b93783b16b75eea334d82221cf5a71af5ba1ff99 Mon Sep 17 00:00:00 2001
From: Guillem Jover <guil...@debian.org>
Date: Mon, 22 May 2023 01:26:15 +0200
Subject: [PATCH] libdpkg: Do not unnecessarily optimize the db-fsys load for
 loaded files

If the files list file is valid and will not be reloaded there is no
point in including it in the optimization load logic. Skip them as
we do when trying to (re)load these files. For the posix_fadvise() case
this is a clear fix, for the fiemap case we are supposedly already
caching the physical offset, so in case we already found it these
packages should have been skipped, but if we failed to fetch the
fiemap information then that might cause the wasted operations for
all packages.

FIXME: check why the caching might not be currently effecting.

Closes: #1035486
---
 lib/dpkg/db-fsys-files.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/dpkg/db-fsys-files.c b/lib/dpkg/db-fsys-files.c
index fc67d1d20..c16b69487 100644
--- a/lib/dpkg/db-fsys-files.c
+++ b/lib/dpkg/db-fsys-files.c
@@ -194,6 +194,9 @@ pkg_files_optimize_load(struct pkg_array *array)
     const char *listfile;
     int fd;
 
+    if (pkg->files_list_valid)
+      continue;
+
     if (pkg->status == PKG_STAT_NOTINSTALLED ||
         pkg->files_list_phys_offs != 0)
       continue;
@@ -233,6 +236,9 @@ pkg_files_optimize_load(struct pkg_array *array)
     const char *listfile;
     int fd;
 
+    if (pkg->files_list_valid)
+      continue;
+
     listfile = pkg_infodb_get_file(pkg, &pkg->installed, LISTFILE);
 
     fd = open(listfile, O_RDONLY | O_NONBLOCK);
-- 
2.40.1

Reply via email to