On Mon, Jan 29, 2001 at 12:26:21PM -0500, Chris Colomb wrote:
> UPS systems can and do fail.
Absolutely--anything fails, eventually. The issue is one of
_probability_. You can give anecdotal descriptions of UPS failures
all day; unless you can point to a statistical proof that they fail
consistently, they're just that: Anecdote. My, and the industry's,
real-world experience is that properly installed and maintained UPS
systems have a negligable failure rate.
Does this preclude the need to back up systems? Not a whit.
Would this preclude use of a jouraling file system? No; but it's not a
'gimme' proposition, either, considering that there are real expenses
associated with use of such a system, especially the current state of
journaling in Linux.
> At my location we have a room full of batteries and a diesel generator
> with fuel for 10 days. Both of which get exercised regularly. Still the
> system that switches to batteries failed last month and we were without
> power for almost a minute.
Anecdotal. Regularly exercised--regularly maintained? What did
post-failure analysis show? Almost certainly something is wrong--either
with installation or maintenance. Perhaps a single point-of-failure in
the control circuitry, or too many months (years?) without maintenance.
> A properly implemented journaling file system has negligible performance
> overhead.
The current implementation on Linux doesn't meet that requirement. Yet.
> To rely on a UPS to the exclusion of a journaling filesystem is IMHO just
> as irresponsible and unprofessional in a truly mission critical production
> enviroment.
Inaccurate, bordering on specious. A UPS is a universal need; a
journaling filesystem is, in practice, not nearly as universal in
either implementation or in practice. Its delivered benefit is only
incremental; if your systems are unstable enough that the benefits of
a journaling system loom large in your recovery analysis, you have much
greater problems--either with your architecture, topology, or equipment.
> I have many terabytes of disk. The type of failure mentioned
> above...I don't even want to think about how long it would take to fsck
> all of that.
Fine. Figure how many catastrophic failures you expect over a given time
period--analyze WHY you expect them--and determine the cost of fsck against
the initial and ongoing cost in terms of CPU cycles, storage, complexity,
etc. of a journaling solution. It's all about risk assessment and cost-
benefit analysis.
> Needless to say, none of the terabytes of mission critical disk are on a
> Linux filesystem.
Needless? Interesting.
> After the event mentioned above, all of my systems were
> back online with no intervention in under five minutes. Except for the
> Linux machines, each of which required fsck by hand, though they had a
> fraction of the data and I/O on them the commercial Unices had.
First, there is no reason the Linux systems should require a manual fsck;
they can be configured for auto-recovery as well as a commercial Unix
system.
Next, the ext2 filesystem is more robust than commercial Unix filesystems
were for more than 3/4 of the total life of Unix. Can it be made better?
Of course. But I don't accept the premise that every commercial system
needs journaling, nor do I believe that this results in commercially
unacceptable levels of exposure to risk.
Your anecdote tells me that someone fell down in disaster recovery
planning and execution; and that the Linux systems weren't configured
for auto-recovery. Not that there was anything inherently wrong with
Linux per se.
There are many approaches to system reliability; journaling is one tool
among many, not a universal solution.
Cheers,
--
Dave Ihnat
[EMAIL PROTECTED]
_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list