Summary: Various mishaps when recovering a botched software RAID system. The rescue functionality of the installer should be improved.
After a somewhat nightmarish (yet finally successful) upgrade of my main workhorse PC to Linux software RAID, I have decided to make this list of suggested improvements. Following the list is a more detailed account of the reasons. This is in no way meant to diminish or belittle the nice work that Debian folks have done so far; I appreciate that very much. However, doing something about the one or other of those points might help other users in the future. Suggestions: ************ 1. Rescue mode needs MD devices The rescue mode of the installer needs a step to activate MD devices. Currently, only the plain disk partitions are visible; that's no help. 2. Netinstall image needs a ping There should be a ping command available on the netinstall image. Otherwise, for a multi-card PC it is hard to check whether the right interface has been configured right. 3. Netinstall's ifconfig needs to set MAC address The ifconfig on the netinstall image (from busybox) does not allow to set the hardware ethernet address. In some scenarios this is important and necessary. 4. Netinstall image should have some packages I'm not sure on that ... but having grub, a kernel and a modules package would have been an immense help. 5. Rescue functionality needs improvement The rescue functionality of the installer is nice but practically not very useful. Polishing the rescue system would have helped me in many situations before, not just this case. I would love to have more of a standalone system (from RAMDISK and/or "Live"-CD). In particular the fact that one can't run many elementary linux commands (tar, gzip, networking, e2fsck, mke2fs, dd, nfs-mount...) without going far along in the install process, is a hindrance. And the point where the actual installation gets manipulated by the installer is not always clear. 6. Grub's built-in documentation is incomprehensible Grub is one of those tools that one needs to work with when the box isn't running. Grub's and grub-install's help are not practically useful. 7. There needs to be a command to copy all data Between cp, tar, rsync & friends there are dozens of variations how to copy over the files of a running system to another location, but none is perfect: - leave out lost+found - leave out /proc, /sys, the automatic /dev - copy all "real" files - copy the /dev on harddisk under the mounted devfs (using mount -bind or so) There is really need for a good program that does it; IMHO that program should be cp. 8. hdparms' error messages unsatisfying When some ATA drivers are not loaded, the hdparms command does not let you set DMA mode for a drive. Unfortunately the error message is not very helpful in localizing and fixing the problem. 9. cdrecord's miserable state is well known Like the majority of other Linux users, I wonder when $ burn_my_iso_to_cd <iso-file> /dev/cdrom will work as expected. Why: **** Now, on to the specifics. Here is the account what happened to me and how I arrived at those suggestions. A) The upgrade I decided to buy another IDE disk for my workhorse PC, to mirror the old one (Software RAID 1) and get some additional (un-mirrored) space on the new disk for junk data (VDR movies etc.) Being an old Debian user, I surely could do that in-flight without a backup ... :-) (Some sins get instant punishment). B) The guide I followed the excellent guide in xtronics.com/reference/SATA-RAID-debian-for-2.6.html In short: - create degraded RAID on new disk - copy data to new disk - modify initrd, fstab, grub - test booting new system - re-format old disk and add to RAID - finalize initrd, fstab, grub - done C) Trouble begins It was at the testing stage, having successfully booted into the degraded RAID system on the new disk, where I decided to record a movie. Re-formatting the old disk and adding it to the RAID, I noticed that the system became very unresponsive and xine had trouble writing the movie to disk. I found out that the DMA was turned off and reconstruction of the RAID took a lot of CPU and disk activity. I could not set the DMA mode with hdparm, apparently some modules for that were missing. (I can't reconstruct since now the DMA is miraculously turned on). D) The fatal mistake I had to stop recording since the movie would get chopped and RAID reconstruction would take forever (20 h). I decided to reboot to get the DMA working and forgot that I had just re-formatted the /boot partition on the old disk, so grub would not find any chain loader, obviously. E) The painful recovery - Grub wouldn't load anything, the system did not boot. - I tried a sarge installer CD that didn't recognize the md signatures of the partitions. - I couldn't figure out how to run the grub installer from a mounted pseudo-root directory where the devices were named differently (old /dev/hde vs. new /dev/sda for SATA). - An old Knoppix allowed me to configure the router functionality and download the installer image. - To burn the image, I had to download k3b since I couldn't figure out either cdrecord or cdrdao within reasonable time (USB-CDrom external writer with broken original writer in Laptop). - The rescue mode of the netinst RC1 CD didn't let me choose the MD partitions for root device. - I could not get to the Internet since my cable modem only responds to a certain MAC address that can't be set with the ifconfig on netinst. - Finally running the install process far enough to get the md devices mounted (it's unclear how to do that manually instead of using the partitioner), I had access to a ping and a working ifconfig to get Internet access. - From the Internet, I could then download grub and install it manually after fighting against /proc and /sys. - The installer had overwritten my /etc/fstab which I then fixed. F) Conclusion I have purposefully omitted the many other failures, most of them results of my own faults, that made this endeavour take a total 11 hours into the night. I think the steps that I described show that while the new installer has gotten very well in its main function (as an INSTALLER), it still lacks most features as a rescue system. Going through various attempts at unbootable USB-stick-rescuers, and old Knoppix and Sarge installers, I'm quite convinced that an effective rescue system MUST be based on the same kernel series and system setup philosophies as the primary installed system (what with udev, /sys, /proc, md5 partition autostart for new superblocks, copyable kernel that allows mounting the target partition as root etc.). Therefore I'll conclude with the plead that the fine folks who did such a great work on the new installer might now turn their eye on its rescue functionality, and I hope this comment is helpful. Tired but finally successful Claus -- Claus Fischer <[EMAIL PROTECTED]> http://www.clausfischer.com/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]