Hi! On Thu, 2023-05-04 at 06:16:52 +0200, Marc Lehmann wrote: > Package: dpkg > Version: 1.20.12 > Severity: wishlist
> on many of my systems using nvme drives or even sata ssds, the > FS_IOC_FIEMAP scan that dpkg does is not only superfluous, but with a > large number of packages installed, absolutely dominates the installation > time (based on profiling) - essentially, dpkg does nothing else during the > preparing/unpacxking/selecting steps. > > and while being a very useful optimisation for spinning rust drives, when > installing many small packages, the scan is apparently done for every > package unpack, when, again, on modern systems, everything is in the cache > aftzer the first pass. Hmm, right, whenever we update and write the files list file on disk we invalidate the in-core lists and force a reload for that package, so the scan might indeed be performed for at least each unpacked package when processing each subsequent one. The other potential cause for the entire database being loaded for each package you might also be seeing is because apt tends to split essential package installation into individual runs to reduce potential breakage (AFAIR). > could you consider making this scan optional (i.e. off-switchable via an > option)? this way, users can disable it in dpkg.cfg on those systems where > it matters, while still retaining it by default (I do not think trying to > autodetect ssds is a good idea here, and a switch should be trivial to > implement). Yes, AFAIR at the time when implementing this, autodetection was considered, but was quickly discarded as unfeasible/unportable and error-prone, and we opted for performing this unconditionally. I've now locally implemented this via a new force option enabled by default, that I'm tentatively calling --force-mechanical-load, but I'll ponder about the name because it's the first thing that came to mind, and I'm not fully convinced. If you happen to have proposals I'm all ears. :) But looking further, the scan is not ignoring packages that already have a valid loaded files list file (although it should already be caching the physical offset if it was found and then skip the scan), so I'll skip those, which might also help by substantially reducing the work being done on both HDD and NVME/SSD. I might also look into avoiding the scan completely if there are only few files list files to reload (instead of say the entire lot), but I'm not sure how much that will help once/if it is only done for the few packages that need a reload (if the current caching is not effective for some reason, then I'd assume it might not help a lot?). And thinking about it perhaps even the force option is not really necessary after the scan fix anyway? (But it would still help in the apt splitting case mentioned above though.) I'm attaching the change to skip unnecessary optimization scan, in case you want to give it a go, although I've only compile tested it for now. > PS: it might also be a good idea to reconsider the whole scan - fiemap > often creates just as many seeks as a stat or an open, especially on more > complicated filesystems. In fact, sorting by inode numbers (which are > essentially free info) is often a good approximation for locality on many > filesystems and might be a good replacement strategy. Perhaps, I would need to check the context from when it was added as I cannot recall whether this option was considered and discarded, and otherwise it would need to be benchmarked on a bunch of different filesystems and types of disks to see whether there would be significant performance degradation/regression. Thanks, Guillem
From b93783b16b75eea334d82221cf5a71af5ba1ff99 Mon Sep 17 00:00:00 2001 From: Guillem Jover <guil...@debian.org> Date: Mon, 22 May 2023 01:26:15 +0200 Subject: [PATCH] libdpkg: Do not unnecessarily optimize the db-fsys load for loaded files If the files list file is valid and will not be reloaded there is no point in including it in the optimization load logic. Skip them as we do when trying to (re)load these files. For the posix_fadvise() case this is a clear fix, for the fiemap case we are supposedly already caching the physical offset, so in case we already found it these packages should have been skipped, but if we failed to fetch the fiemap information then that might cause the wasted operations for all packages. FIXME: check why the caching might not be currently effecting. Closes: #1035486 --- lib/dpkg/db-fsys-files.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/dpkg/db-fsys-files.c b/lib/dpkg/db-fsys-files.c index fc67d1d20..c16b69487 100644 --- a/lib/dpkg/db-fsys-files.c +++ b/lib/dpkg/db-fsys-files.c @@ -194,6 +194,9 @@ pkg_files_optimize_load(struct pkg_array *array) const char *listfile; int fd; + if (pkg->files_list_valid) + continue; + if (pkg->status == PKG_STAT_NOTINSTALLED || pkg->files_list_phys_offs != 0) continue; @@ -233,6 +236,9 @@ pkg_files_optimize_load(struct pkg_array *array) const char *listfile; int fd; + if (pkg->files_list_valid) + continue; + listfile = pkg_infodb_get_file(pkg, &pkg->installed, LISTFILE); fd = open(listfile, O_RDONLY | O_NONBLOCK); -- 2.40.1