Your message dated Thu, 08 May 2008 11:46:21 -0700
with message-id <[EMAIL PROTECTED]>
and subject line Re: Bug#480111: openafs-dbserver: VLDB changes not being
sync'ed to vldb.DB0
has caused the Debian Bug report #480111,
regarding openafs-dbserver: VLDB changes not being sync'ed to vldb.DB0
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [EMAIL PROTECTED]
immediately.)
--
480111: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480111
Debian Bug Tracking System
Contact [EMAIL PROTECTED] with problems
--- Begin Message ---
Subject: openafs-dbserver: VLDB changes not being sync'ed to vldb.DB0
Package: openafs-dbserver
Version: 1.4.7~pre3.dfsg1-1
Severity: critical
Justification: breaks the whole system
*** Please type your report below this line ***
Recent vlserver's fail to write VLDB changes to the
/var/lib/openafs/db/vldb.DB0 file on non sync-sites. The effect is that,
whilst the in-memory VLDB is correct, the version on disk is not correct
except on the sync site. If all vlserver's for a cell are restarted *at
the same time*, all recent changes to the VLDB are lost.
The problem is reproducible:
- Stop, with bos, all 3 vlserver's (all three are running the version
below).
- Remove /var/lib/openafs/db/vldb* on all db servers.
- Restart, with bos, all 3 vlserver's. Empty vldb.DB0 files are
created on all servers. The vlservers show no errors in logs.
- Wait for quorum to be established (check via udebug, recovery
state 1f).
- Run 'vos listvldb' to check that no volumes are registered.
- Run 'vos syncvldb' for each fileserver in cell.
- udebug on sync site shows DB version incrementing + recovery state 1f.
- 'vos listvldb' now shows all volumes in cell correctly and all
clients can successfully access cell volumes.
- Wait 1 or more hours.
- The vldb.DB0 file has zero size on non sync-site and timestamp when
vlserver was started. On sync site it has grown and has timestamp of
last syncvldb operation.
- Restart all vlservers. The vlservers show no errors in logs.
- Wait for quorum to be established (check via udebug) + recovery
state 1f.
- 'vos listvldb' shows no volumes.
- Redoing the syncvldb allows the clients to again access volumes.
This problem was also seen with i686 dbserver on testing (before
upgrade to amd64 testing) and seems to have begun somewhere after
openafs 1.4.2. Initially the problem was seen with a VLDB that had
worked correctly for 2+ years. At some point (1.4.6?) recently changes
stopped being written to the vldb.DB0 (but no errors were logged) and
the above procedure was attempted in order begin with a clean slate.
The effect however remains and thus cannot be linked to a corrupt
vldb.DB0. Testing with a backup of the original VLDB also shows this
problem. vldb_check seems satisfied that the vldb.DB0 in all cases
not corrupted.
>From the above it appears that:
- the vldb.DB0 file is not being updated on non-sync sites
- when a restart occurs, only the sync site has a recent vldb.DB0
- but is outvoted by the previously non-sync sites and
- recent changes are discarded
-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (990, 'testing'), (300, 'unstable'), (80, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.25-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_ZA.UTF-8, LC_CTYPE=en_ZA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages openafs-dbserver depends on:
ii libc6 2.7-10 GNU C Library: Shared
libraries
ii openafs-client 1.4.7~pre3.dfsg1-1 AFS distributed filesystem
client
ii openafs-fileserver 1.4.7~pre3.dfsg1-1 AFS distributed filesystem
file se
ii perl 5.8.8-12 Larry Wall's Practical
Extraction
openafs-dbserver recommends no packages.
-- no debconf information
--- End Message ---
--- Begin Message ---
Version: 1.4.7.dfsg1-1
Hans Grobler <[EMAIL PROTECTED]> writes:
> I can confirm that 1.4.7 fixes this serious bug. With 1.4.7 the empty
> vldb.DB0 files created start with size 16 bytes, whereas previously they
> were 0 size... which correlates with a fd problem as hinted at in the
> Changelog.
Excellent. Thank you for the confirmation! I'll set the bug status
accordingly and request that the release team bump the urgency of the
testing propagation.
--
Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/>
--- End Message ---