Hello all, There are some discussions about whether illumos on SPARC is a dead-end or not (i.e. whether it is stupid to buy HW systems from the one vendor and not buy their software support, or if there is more than one vendor, or if anyone would pick up the open-sourced processor designs for the Niagaras and build some cool appliances or servers). So, just for the anecdotal sake, I wanted to share this weekend's experience about OpenSolaris on SPARC - and how it saved the day.
While it may seem unlikely at this moment that new SPARC systems would be rolled out for OI to get installed on them, there are many already-deployed reliable boxes which would run obsolete (or our new) software "until they fscking die". I was asked to look at a T2000 with Sol 10u8 which did just that: it died during what could have been fsck - if ZFS had one. Apparently, the system's users did nothing formally invalid, they were just zfs-sending and zfs-receiving some datasets within the pool in order to recompress older data with gzip-9, then they tried to destroy the older dataset tree and rename the compressed copy to take its place. Something went wrong, the pool locked up with no IOs taking place (according to iostat). The "zfs" commands all hung, however "zpool status" and friends did not. Filesystem operations also went well, so running zones were properly stopped and the box was ultimately rebooted. It did not come back up. Luckily, there was a Solaris installation server in that network, so it took a few minutes to prepare a LAN installation resource from a stashed SXCE snv_129_sparc image, and boot the T2000 from the network, into single user mode. OpenSolaris found nothing suspicious about the data pool and the rpool, imported and exported them without complaints. While at the rpool, we deleted the /etc/zfs/zpool.cache file to allow the system to boot its Solaris 10. It booted, but also hung at subsequent "zpool import -R / pool" request - in the same way: no iostat operations to report, and no errors in the logs... Back to the networked boot of OpenSolaris, where we imported the data pool, destroyed the remaining old uncompressed datasets and completed the renaming of compressed datasets to take place of those ones, transparently to the zones and other consumers. This did unclog something, so the Solaris 10 image did afterwards quickly import the pool and happily uses it today. Yesterday the old OpenSolaris SXCE for SPARC did save the day. I can easily imagine hitting some bugs in ZFS that were fixed after the last SXCE release, where a hypothetical "OpenIndiana for SPARC" image would be able to save us - even if it is not (yet) used as the everyday OS for the box. Hope this story entertains someone and helps others, //Jim Klimov _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
