Bug#888234: dpkg: packages not fully upgraded, but dpkg doesn't notice

Guillem Jover Thu, 25 Jan 2018 18:07:15 -0800

Hi!

On Fri, 2018-01-26 at 01:04:08 +0100, Christoph Anton Mitterer wrote:
> On Fri, 2018-01-26 at 00:42 +0100, Guillem Jover wrote:
> > Are you sure they are not fully upgraded? What makes you think so?
> > Just the dpkg.log below?
> 
> No, I ignored it at first cause I thought it's not so unlikely, that
> the log simply didn't got flushed out before the freeze.


Yeah, I guess it's never been considered an important file to preserve
above all, including performance degradation. I'll have to ponder, or
benchmark whether fdatasync(2)ing the log file would penalize too much.

> But then there was recently an upgrade to glibc (which includes local
> re-generation). A crash happened and afterwards e.g. gnome-terminal
> didn't even start anymore (with some locale related errors, when
> started from xterm).
> Once I regenerated the locales, gnome-terminal worked fine again.
> 
> Of course it could simply be, that the locales didn't get flushed out
> in time (respectively no commit was made in btrfs),... but then dpkg
> shouldn't think it would be configured, right?

dpkg does not and cannot control what and how things are done in
packages's maintainer scripts. So it can happen that dpkg syncs all
its databases and all the extracted files to disk, but the maintainer
scripts do not call the equivalent fsync(2) and thus those linger
around in memory and get lost on a crash. I'd expect most maintainer
script to not be abrupt-crash-safe, or even many applications TBH,
as not many things do the rename(2)/fsync(2) dance or similar.

> > > Normally dpkg -C would show this then, but it doesn't.
> > > Neither does dpkg --configure -a do anything.
> > 
> > And there are no packages in the status file with Status less than
> > installed. And no lingering files under «/var/lib/dpkg/updates»?
> 
> /var/lib/dpkg/updates/ is empty (well at least right now... not sure if
> it would have gotten cleaned up somehow else in the meantime).

Any subsequent write action would have incorporated the database
journal entries.

> > > This happened alrready quite some times now, an probably my system
> > > has
> > > many packages in a state not fully installed, while dpkg thinks
> > > everything
> > > would be fine.
> > 
> > dpkg is very careful about how it handles its database. If it think
> > they are installed, and there are no update journal entries on the
> > above directory. Then this might indicate something more severe like
> > a very broken filesystem on-disk or implementation or hardware
> > failure
> > or similar.
> 
> Arguably, btrfs isn't perfect, but so far I never found any real
> corruptions in case of any freezes/crashes/etc.
> The only thing what I ever found was that something wasn't committed
> yet, and got completely removed, but that in turn should dpkg protect
> against, AFAIU (with syncs at the appropriate places).

dpkg can only protect what it does itself.

> > > Interestingly: debsums -asc doesn't find problems.
> > 
> > That to me would indicate that the packages are either the old
> > versions or the new ones, but thay match.
> 
> Is there any easy way to check that (i.e. whether they files are all
> still old, but dpkg thinks the upgrade was performed and the new
> version would be in place)?
> I did a random sample and compared one file of libc6 and locales
> package, but from my system with that of the .deb,... but of course I
> may have just picked the wrong one that still matches.

The easiest is probably to download the .debs matching the versions in
the system, and compare their md5sums with the ones in the dpkg db. If
«dpkg -V» then says there's no problem, then that should mean the
unpacked files are fine.

This of course does not cover any files generated by maintainer
scripts.

> Could it be, that they always got unpacked, but not configured and that
> only this information would have been somehow lost?
> Cause that could explain why the locales haven't been regenerated.

If they are unpacked but not configured the new files would be on
disk, and the new md5sums as well, and the db would contain an
appropriate status. The package status does not progress until the
current stage has been finished, and those get properly synced to
disk. So in principle no, that should never happen.

I assume though that the locales had been generated but not flushed
to disk. I don't see any fsync(2)/fdatasync(2) in the glibc source
for the locale generators (one is a shell script, the other is a perl
script, and the last is a C program).

So, if the above checks look fine, I'd say the only thing that can
be done is perhaps to consider syncing the log file, but that might
be too much. And perhaps clone and reassign to glibc to make its
maintainer scripts more robust against abrupt-crashes. But take
into account this will be an uphill battle, as mentioned above
most maintscript and even most programs and applications are not
abruch-crash safe anyway…

Thanks,
Guillem

Bug#888234: dpkg: packages not fully upgraded, but dpkg doesn't notice

Reply via email to