"Andrew" == Andrew Perrin <[EMAIL PROTECTED]> writes: Andrew> FWIW, I'm skeptical of Nate's claim that excessive I/O Andrew> errors must bring down the system. I'm certainly not a Andrew> kernel hacker, but I see no reason why the kernel couldn't Andrew> do what it does in other roughly analogous situations:
I'm not a Linux kernel programmer, but I've worked on device drivers and firmware for many systems. There are always some hardware errors you cannot recover from, though the details will vary based on the situation. For every strategy you can device, you can find another class of hardware errors you simply cannot recover from. For example, if I program a DMA controller to transfer bytes from address x to address y but the controller sends it somewhere else, I'm hosed. When I program a PCI bus master to do a burst transfer, I *expect* it to obey the rules. I do not checksum all of my memory and then verify that it did not change. If the hardware breaks the rules, there is very little you can do to recover. For example, depending on the OS and architecture, the I/O error might erase the very code that is supposed to recover from the error! Most programming involves a chain of trust. It is just one of those compromises you have to make. TCPA non-withstanding ;-) Cheers! Shyamal -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]