Package: lvm2 Version: 2.0.0.24-1 Severity: critical Justification: Can break whole system
During a reorganization of my LVs, I did a lvremove that in itself produced no errors. However, the next lvremove failed with messages that devices (?) had been 'left open' and metadata for the VG that contained both LVs was inconsistent. I looked for a solution, but was not able to find anything. So I decided to reboot the system in the hope that that would at lease solve the 'left open' messages. In retrospect probably not the best action. The result was, as my / is on the broken VG, that the system failed to boot as / can not be mounted. On reboot, lvm2 seems to try to recover the inconsistency, but AFAICT fails because of an error in the code. IMO there are two problems: 1. the error in the lvremove operation that caused the inconsistency; 2. an error in 'lvm vgscan' that fails to correct the inconsistency. I checked the upstream changelog for sid's 2.0.0.32-1 version, but did not see anything that looked like it would fix the 1st problem. Note. It looks like the 2nd problem _may_ have been fixed in upstream 2.01. From their changelog: <snip> Version 2.01.00 - 17th January 2005 =================================== Fix vgscan metadata auto-correction. </snip> Using Debian Installer I've managed to revive the system insofar as I have SW-RAID operational and can now use lvm commands to access the VG's and LV's, but have not been able to repair the problem. I have good faith that the system can be recovered, as it seems that the inconsistency is relatively minor and fixable, but I will need some help to do it. The rest of this report gives more background and details of my configuration, of what exactly happened and of my analysis. Sorry if it's a bit long, but I tried to include all relevant info. TIA for and help and suggestions on how to proceed. BACKGROUND ========== The system is a recently installed Sarge box used as a server in my home network, running current 2.4.27-2 kernel. The system has an internal 160G ide harddisk and an external 12G Megaraid scsi storage unit. When I installed the system, I decided to use reiserfs for some partitions as I had read that it was more efficient for small files. A problem during a reboot and some comments on #debian-boot (IRC) made me decide that maybe ext3 would have been a better option, so I decided to reorganize things. The raid and VG setup may seem a bit strange, but was partly caused by a hardware failure that cased me to install Sarge in the first place. CONFIGURATION ============= Before I started the reorganization, my config was as follows. Disk /dev/discs/disc1/disc: 163.9 GB, 163928604672 bytes 255 heads, 63 sectors/track, 19929 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Start End Blocks Id System Mountpoint part1 1 6 48163+ 83 Linux /boot part2 7 26 160650 82 Linux swap swap part3 27 19929 159870847+ 5 Extended part5 27 4488 35840983+ fd Linux raid autodetect part6 4489 9588 40965718+ 8e Linux LVM part7 9589 14688 40965718+ 8e Linux LVM part8 14689 18316 29141878+ 8e Linux LVM part9 18317 19929 12956391 fd Linux raid autodetect Disk /dev/discs/disc0/disc: 13.2 GB, 13268680704 bytes 255 heads, 63 sectors/track, 1613 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Start End Blocks Id System part1 1 1613 12956391 fd Linux raid autodetect RAID configuration: md0: part5 on IDE disk (degraded RAID1; waiting for replacement 2nd HD) md1: part9 on IDE disk and part1 on SCSI (RAID1) Both defined as PVs for LVM. Volume groups: - sys on PVs md0 + md1 - work on PVs part6 + part7 (part8 was spare PV) Logical volumes: LV Mountpoint Filesys PV - sys-root / ext3 md0 - sys-home /home reiserfs md0 - sys-var /var reiserfs md0 - sys-tmp /tmp reiserfs md0 - sys-exports /exports reiserfs created on md1, extended to md0 - work-debmirror not relevant reiserfs part6+part7 - work-installer not relevant reiserfs part7 WHAT HAPPENED ============= I decided to start with sys-exports and sys-home. I created a new VG 'temp' on part8 and new LVs temp-exports and temp-home. Copied and verified data. umounted /exports and lvremoved sys-exports. The error must have occurred at that point. After umount /home, lvremove sys-home failed with the message of 'left open'. The LV's in both VG work and VG temp are still accessible normally. If I now (using the Debian Installer 'rescue' system) do a vgdisplay. Note: Debian installer uses devfs. <output of vgdisplay -- start> # vgdisplay Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm: using /dev/ide/host0/bus0/target0/lun0/part9 not /dev/scsi/host0/bus0/target0/lun0/part1 --- Volume group --- VG Name temp System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 27.79 GB PE Size 4.00 MB Total PE 7114 Alloc PE / Size 1500 / 5.86 GB Free PE / Size 5614 / 21.93 GB VG UUID baXpMk-IvVl-Dz1K-Qn5P-wXGF-cZoD-nfO4v9 --- Volume group --- VG Name work System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 78.13 GB PE Size 4.00 MB Total PE 20002 Alloc PE / Size 15500 / 60.55 GB Free PE / Size 4502 / 17.59 GB VG UUID r3BOv2-31pu-xehU-ufdi-Neae-852L-GUyLTM WARNING: Volume group "sys" inconsistent --- Volume group --- VG Name sys System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 10 VG Access read/write VG Status resizable MAX LV 0 Cur LV 5 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 46.54 GB PE Size 4.00 MB Total PE 11913 Alloc PE / Size 6912 / 27.00 GB Free PE / Size 5001 / 19.54 GB VG UUID YOc5lQ-m6o8-wob9-ghrY-VAZe-9YX4-qwWbCp <output of vgdisplay -- end> vgscan gives: <output of vgscan -- start> # vgscan Reading all physical volumes. This may take a while... Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm: using /dev/ide/host0/bus0/target0/lun0/part9 not /dev/scsi/host0/bus0/target0/lun0/part1 Found volume group "temp" using metadata type lvm2 Found volume group "work" using metadata type lvm2 Inconsistent metadata copies found - updating to use version 10 Automatic metadata correction failed Volume group "sys" not found <output of vgscan -- end> ANALYSIS ======== An indication of why correction of metadata fails is in vgscan -vvv: <snip of output of vgscan -vvv -- start> Finding volume group "sys" Opened /dev/md/0 /dev/md/0: lvm2 label detected Opened /dev/scsi/host0/bus0/target0/lun0/part1 /dev/scsi/host0/bus0/target0/lun0/part1: lvm2 label detected Read sys metadata (10) from /dev/md/0 at 18432 size 1879 Opened /dev/ide/host0/bus0/target0/lun0/part9 /dev/md/0: lvm2 label detected /dev/scsi/host0/bus0/target0/lun0/part1: lvm2 label detected Read sys metadata (9) from /dev/ide/host0/bus0/target0/lun0/part9 at 7168 size 2233 Inconsistent metadata copies found - updating to use version 10 Writing sys metadata to /dev/md/0 at 20480 len 1909 Automatic metadata correction failed Volume group "sys" not found Unlocking /var/lock/lvm/V_sys Closed /dev/md/0 Closed /dev/scsi/host0/bus0/target0/lun0/part1 Closed /dev/ide/host0/bus0/target0/lun0/part9 <snip of output of vgscan -vvv -- end> It shows that md0 has metadata (10) and part9 (md1) has metadata (9). However, when updating vgscan writes (10) to md0 instead to md1, in effect changing absolutely nothing! It should write to md1. Would it be possible to copy this metadata manually? I also don't understand why it uses /dev/scsi/host0/bus0/target0/lun0/part1 instead of /dev/md/1 to access the 2nd PV in the VG. Especially as it _does_ use /dev/md/0. During preparation for this reorganization I noticed another strange thing. Several commands would say something like: Found duplicate PV QSpzOE3lqwPqyxHU4sV626bNnIxlbQrm: using /dev/hda9, not /dev/sda1 However, later in the output /dev/sda1 would always be printed. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]