Package: lvm2
Version: 2.0.0.24-1
Severity: critical
Justification: Can break whole system

During a reorganization of my LVs, I did a lvremove that in itself 
produced no errors. However, the next lvremove failed with messages that 
devices (?) had been 'left open' and metadata for the VG that contained 
both LVs was inconsistent.

I looked for a solution, but was not able to find anything. So I decided 
to reboot the system in the hope that that would at lease solve the 'left 
open' messages. In retrospect probably not the best action.
The result was, as my / is on the broken VG, that the system failed to 
boot as / can not be mounted.

On reboot, lvm2 seems to try to recover the inconsistency, but AFAICT 
fails because of an error in the code.

IMO there are two problems:
1. the error in the lvremove operation that caused the inconsistency;
2. an error in 'lvm vgscan' that fails to correct the inconsistency.

I checked the upstream changelog for sid's 2.0.0.32-1 version, but did not 
see anything that looked like it would fix the 1st problem.

Note. It looks like the 2nd problem _may_ have been fixed in upstream 
2.01. From their changelog:
<snip>
Version 2.01.00 - 17th January 2005
===================================
  Fix vgscan metadata auto-correction.
</snip>

Using Debian Installer I've managed to revive the system insofar as I have 
SW-RAID operational and can now use lvm commands to access the VG's and 
LV's, but have not been able to repair the problem.
I have good faith that the system can be recovered, as it seems that the 
inconsistency is relatively minor and fixable, but I will need some help 
to do it.

The rest of this report gives more background and details of my 
configuration, of what exactly happened and of my analysis.
Sorry if it's a bit long, but I tried to include all relevant info.

TIA for and help and suggestions on how to proceed.


BACKGROUND
==========
The system is a recently installed Sarge box used as a server in my home 
network, running current 2.4.27-2 kernel.
The system has an internal 160G ide harddisk and an external 12G Megaraid 
scsi storage unit.

When I installed the system, I decided to use reiserfs for some partitions 
as I had read that it was more efficient for small files. A problem 
during a reboot and some comments on #debian-boot (IRC) made me decide 
that maybe ext3 would have been a better option, so I decided to 
reorganize things.

The raid and VG setup may seem a bit strange, but was partly caused by a 
hardware failure that cased me to install Sarge in the first place.

CONFIGURATION
=============
Before I started the reorganization, my config was as follows.

Disk /dev/discs/disc1/disc: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device  Start   End    Blocks   Id  System                  Mountpoint
part1       1     6     48163+  83  Linux                   /boot
part2       7    26    160650   82  Linux swap              swap
part3      27 19929 159870847+   5  Extended
part5      27  4488  35840983+  fd  Linux raid autodetect
part6    4489  9588  40965718+  8e  Linux LVM
part7    9589 14688  40965718+  8e  Linux LVM
part8   14689 18316  29141878+  8e  Linux LVM
part9   18317 19929  12956391   fd  Linux raid autodetect

Disk /dev/discs/disc0/disc: 13.2 GB, 13268680704 bytes
255 heads, 63 sectors/track, 1613 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device  Start   End    Blocks   Id  System
part1       1  1613  12956391   fd  Linux raid autodetect

RAID configuration:
md0: part5 on IDE disk (degraded RAID1; waiting for replacement 2nd HD)
md1: part9 on IDE disk and part1 on SCSI (RAID1)
Both defined as PVs for LVM.

Volume groups:
- sys on PVs md0 + md1
- work on PVs part6 + part7
(part8 was spare PV)

Logical volumes:
  LV              Mountpoint    Filesys   PV
- sys-root        /             ext3      md0
- sys-home        /home         reiserfs  md0
- sys-var         /var          reiserfs  md0
- sys-tmp         /tmp          reiserfs  md0
- sys-exports     /exports      reiserfs  created on md1, extended to md0
- work-debmirror  not relevant  reiserfs  part6+part7
- work-installer  not relevant  reiserfs  part7

WHAT HAPPENED
=============
I decided to start with sys-exports and sys-home.
I created a new VG 'temp' on part8 and new LVs temp-exports and temp-home. 
Copied and verified data. umounted /exports and lvremoved sys-exports.
The error must have occurred at that point.
After umount /home, lvremove sys-home failed with the message of 'left 
open'.

The LV's in both VG work and VG temp are still accessible normally.

If I now (using the Debian Installer 'rescue' system) do a vgdisplay.
Note: Debian installer uses devfs.

<output of vgdisplay -- start>
# vgdisplay
  Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm: 
using /dev/ide/host0/bus0/target0/lun0/part9 
not /dev/scsi/host0/bus0/target0/lun0/part1
  --- Volume group ---
  VG Name               temp
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               27.79 GB
  PE Size               4.00 MB
  Total PE              7114
  Alloc PE / Size       1500 / 5.86 GB
  Free  PE / Size       5614 / 21.93 GB
  VG UUID               baXpMk-IvVl-Dz1K-Qn5P-wXGF-cZoD-nfO4v9

  --- Volume group ---
  VG Name               work
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               78.13 GB
  PE Size               4.00 MB
  Total PE              20002
  Alloc PE / Size       15500 / 60.55 GB
  Free  PE / Size       4502 / 17.59 GB
  VG UUID               r3BOv2-31pu-xehU-ufdi-Neae-852L-GUyLTM

  WARNING: Volume group "sys" inconsistent
  --- Volume group ---
  VG Name               sys
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  10
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                5
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               46.54 GB
  PE Size               4.00 MB
  Total PE              11913
  Alloc PE / Size       6912 / 27.00 GB
  Free  PE / Size       5001 / 19.54 GB
  VG UUID               YOc5lQ-m6o8-wob9-ghrY-VAZe-9YX4-qwWbCp
<output of vgdisplay -- end>

vgscan gives:
<output of vgscan -- start>
# vgscan
  Reading all physical volumes.  This may take a while...
  Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm: 
using /dev/ide/host0/bus0/target0/lun0/part9 
not /dev/scsi/host0/bus0/target0/lun0/part1
  Found volume group "temp" using metadata type lvm2
  Found volume group "work" using metadata type lvm2
  Inconsistent metadata copies found - updating to use version 10
  Automatic metadata correction failed
  Volume group "sys" not found
<output of vgscan -- end>

ANALYSIS
========
An indication of why correction of metadata fails is in vgscan -vvv:
<snip of output of vgscan -vvv -- start>
    Finding volume group "sys"
        Opened /dev/md/0
      /dev/md/0: lvm2 label detected
        Opened /dev/scsi/host0/bus0/target0/lun0/part1
      /dev/scsi/host0/bus0/target0/lun0/part1: lvm2 label detected
        Read sys metadata (10) from /dev/md/0 at 18432 size 1879
        Opened /dev/ide/host0/bus0/target0/lun0/part9
      /dev/md/0: lvm2 label detected
      /dev/scsi/host0/bus0/target0/lun0/part1: lvm2 label detected
        Read sys metadata (9) from /dev/ide/host0/bus0/target0/lun0/part9 
at 7168 size 2233
  Inconsistent metadata copies found - updating to use version 10
        Writing sys metadata to /dev/md/0 at 20480 len 1909
  Automatic metadata correction failed
  Volume group "sys" not found
      Unlocking /var/lock/lvm/V_sys
        Closed /dev/md/0
        Closed /dev/scsi/host0/bus0/target0/lun0/part1
        Closed /dev/ide/host0/bus0/target0/lun0/part9
<snip of output of vgscan -vvv -- end>

It shows that md0 has metadata (10) and part9 (md1) has metadata (9).
However, when updating vgscan writes (10) to md0 instead to md1, in effect 
changing absolutely nothing! It should write to md1.

Would it be possible to copy this metadata manually?

I also don't understand why it uses
/dev/scsi/host0/bus0/target0/lun0/part1
instead of /dev/md/1 to access the 2nd PV in the VG.
Especially as it _does_ use /dev/md/0.

During preparation for this reorganization I noticed another strange 
thing. Several commands would say something like:
   Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm:
   using /dev/hda9, not /dev/sda1
However, later in the output /dev/sda1 would always be printed.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to