On Mon, Jan 29, 2001 at 12:26:21PM -0500, Chris Colomb wrote:
> UPS systems can and do fail.

Absolutely--anything fails, eventually.  The issue is one of
_probability_.  You can give anecdotal descriptions of UPS failures
all day; unless you can point to a statistical proof that they fail
consistently, they're just that:  Anecdote.  My, and the industry's,
real-world experience is that properly installed and maintained UPS
systems have a negligable failure rate.

Does this preclude the need to back up systems?  Not a whit.
Would this preclude use of a jouraling file system?  No; but it's not a
'gimme' proposition, either, considering that there are real expenses
associated with use of such a system, especially the current state of
journaling in Linux.

> At my location we have a room full of batteries and a diesel generator
> with fuel for 10 days. Both of which get exercised regularly. Still the
> system that switches to batteries failed last month and we were without
> power for almost a minute.

Anecdotal.  Regularly exercised--regularly maintained?  What did
post-failure analysis show?  Almost certainly something is wrong--either
with installation or maintenance.  Perhaps a single point-of-failure in
the control circuitry, or too many months (years?) without maintenance.

> A properly implemented journaling file system has negligible performance
> overhead.

The current implementation on Linux doesn't meet that requirement.  Yet.

> To rely on a UPS to the exclusion of a journaling filesystem is IMHO just
> as irresponsible and unprofessional in a truly mission critical production
> enviroment.

Inaccurate, bordering on specious.  A UPS is a universal need; a
journaling filesystem is, in practice, not nearly as universal in
either implementation or in practice.  Its delivered benefit is only
incremental; if your systems are unstable enough that the benefits of
a journaling system loom large in your recovery analysis, you have much
greater problems--either with your architecture, topology, or equipment.

> I have many terabytes of disk. The type of failure mentioned
> above...I don't even want to think about how long it would take to fsck
> all of that.

Fine.  Figure how many catastrophic failures you expect over a given time
period--analyze WHY you expect them--and determine the cost of fsck against
the initial and ongoing cost in terms of CPU cycles, storage, complexity,
etc. of a journaling solution.  It's all about risk assessment and cost-
benefit analysis.

> Needless to say, none of the terabytes of mission critical disk are on a
> Linux filesystem.

Needless?  Interesting.

> After the event mentioned above, all of my systems were
> back online with no intervention in under five minutes. Except for the
> Linux machines, each of which required fsck by hand, though they had a
> fraction of the data and I/O on them the commercial Unices had. 

First, there is no reason the Linux systems should require a manual fsck;
they can be configured for auto-recovery as well as a commercial Unix
system.

Next, the ext2 filesystem is more robust than commercial Unix filesystems
were for more than 3/4 of the total life of Unix.  Can it be made better?
Of course.  But I don't accept the premise that every commercial system
needs journaling, nor do I believe that this results in commercially
unacceptable levels of exposure to risk.

Your anecdote tells me that someone fell down in disaster recovery
planning and execution; and that the Linux systems weren't configured
for auto-recovery.  Not that there was anything inherently wrong with
Linux per se.

There are many approaches to system reliability; journaling is one tool
among many, not a universal solution.

Cheers,
-- 
        Dave Ihnat
        [EMAIL PROTECTED]



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to