Oops, attaching the missing aptitiude log and syslog of space1.
Original Text (with line-wrapping): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An upgrade (os-prober 1.35 -> 1.39) corrupted 3.3 TB of data on our SAN. I was upgrading the host space1 and the data corruption occurred on space2. An install script of os-prober tried mounting as read-only a SAN volume which was already mounted on space2. That volume (on sapce2) was in production use so EXT3-fs (on space1) concluded that the journal was inconsistent, re-mounted as writable and performed a "recovery". The mount on space2 became unavailable bringing the production host down. Re-mounting failed. After rebooting space2 fsck was required on the affected partition. It ran for many hours and found a huge number errors. Probably more than 10,000 errors. Then I was able to mount the volume and saw that our data was turned into gray goo: parts of system prel scripts were replaced by binary chunks, databases and web servers would not start. I had 30 containers in production. Some actually booted despite major sporadic data corruption in them. My fellow system administrator from another department on campus said that their distribution (CentOS) does not run install scripts. As he worded it - Debian ended-up managing your SAN for you. The reason why I got os-prober was the change in Debian's policy to install all recommended packages and os-prober was recommended by Grub. I am not sure why the data corruption did not happed when I upgraded to Squeeze a month ago (grub-common 1.96+20080724-16 -> 1.98-1). I'm attaching the aptitiude log and syslog of space1. The root of the problem is also described in bug #556739 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556739). The author predicts filesystem corruption and data loss back in 2009. Alex ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- ---------------------------------------------------------------- Aleksandr Levchuk Bioinformatics Systems and Databases http://facility.bioinformatics.ucr.edu/people/aleksandr-levchuk Cell Phone: (951) 368-0004 Lab Phone: (951) 905-5232 Institute for Integrative Genome Biology University of California, Riverside ---------------------------------------------------------------
bug599203_syslog.log
Description: Binary data
bug599203_aptitude.log
Description: Binary data