Hi, just my two cents if you're thinking about restoring from backups. People on this list are often reminded to use "IT" versions of the firmware for the RAID controllers, so that the disks are individually presented and the controller operates in JBOD mode, compared to the hardware RAID of the "IR" version.
Is your controller one of those that can be reflashed to IT? If so, it would probably be a good idea to do so before restoring from backup, so you can benefit from all the ZFS goodies instead of relying on the hardware RAID controller. Again, I am not familiar with your particular controller, so take this with a grain of salt, but check it out before you restore. Bryan Sent from my BlackBerry® wireless device from WIND -----Original Message----- From: CJ Keist <[email protected]> Date: Tue, 09 Jul 2013 08:18:00 To: <[email protected]> Reply-To: Discussion list for OpenIndiana <[email protected]> Subject: Re: [OpenIndiana-discuss] OI 151_a7 crashing when trying to import ZFS pool Jim, Yes, the MegaRaid 92608i is presenting the OS with a single disk pool. I had to stop the scrub as it was going to take over three days to complete. I figured I could be well into me restores from backup in that amount of time. I was wondering what the cause would of been, the OS or the raid controller? The system was completely hung, so I had no choice be do a force reboot. I'm leaning towards the raid controller as the primary culprit. I'm not finding the core dump file anywhere. Where does OI store kernel crash data? If it's not too big I could send it in to the list? On 7/9/13 5:22 AM, Jim Klimov wrote: > On 2013-07-08 22:58, CJ Keist wrote: >> Thank you all for the replies. >> I tried OmniOS and Oracle Solaris 11.1 but both were not able to >> import the data pool. So I have reinstalled OI 151a7 and after importing >> the data and having it crash, I booted up in single user mode. At this >> point I was able to initiate zpool scrub data and it looks to be >> running!! I will wait and see if the scrub can finish and then try to >> remount everything. See attached pic. > > That screenshot seems disturbing: with such a large pool you only have > one device. Is it on hardware RAID which masks away all the disks and > possible redundancy and repair variants away from ZFS? In that case, > the data error maybe anywhere in that RAID's implementation (i.e. when > you did a force-reboot, some critical data was not flushed to disks > at all, or worse - in a wrong order - for example uberblock updates > came before the other metadata updates, and the latter never made it). > > I think that for the scrub you did mount the pool read-write, so it > would be too late to try rolling back a few transactions into an older > but possibly more consistent state of the pool (or did you already do > that while successfully importing?) > > If the pool just "gave up" and after a few panics began to import at > least so much that the kernel accepts it, it is possible (just from > my experience, shooting ideas into sky here) that some deferred ops > were recorded on the pool, and it finally unrolled them. For example, > I had a series of panicky reboots when deleting lots of data on a > deduped pool on a machine with low RAM (8Gb) - enumerating the DDT > consumed a lot more, the kernel couldn't swap, BAM! Took about two > weeks of resetting it every 3-4 hours, for the box to get itself > straight... > > For the developers here to provide more targeted ideas and/or make a > solution, it would sure be helpful if you could provide a stack trace > of the kernel panic - to see where it goes wrong (probably, some data > on disk did not match an assertion like unexpected zero/nonzero value). > For this you could boot into kmdb (preferably on a serial console, > the traces are quite long and roll off the 25-line screen), so that > when the problem occurs - the messages are printed but the machine > doesn't reboot automatically. Actually, with a serial console you > might care a bit less about kmdb - if you can copy-paste the trace > quickly enough before it is overwritten by BIOS POST messages. > > HTH, > //Jim > > > _______________________________________________ > OpenIndiana-discuss mailing list > [email protected] > http://openindiana.org/mailman/listinfo/openindiana-discuss -- C. J. Keist Email: [email protected] Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
