On Thu, Oct 03, 2002 at 09:21:47AM +0200, Nathalie Boulos wrote:
> I have a RH7.1 box with 2 ethernet cards.
> I need to implement redundancy on the network: in case one of the eth cards 
> fails, the second will automatically replace it with the same IP (No DNS 
> redundancy).

I hate to cause trouble, but what is the likelihood of an ethernet
card failing?  What is the likelihood of a failure from the added
complexity (both hardware and software) of handling two ethernet cards
instead of one?  Is having two cards indeed a net win?

Don't get me wrong, I am a fan of redundancy--as evidenced by my
promoting software raid 1.  But I figure that disks are moving parts
and moving parts scare me, and the complexity in software raid is
already in the kernel and well debugged, so putting it to use seems
like a good trade off: very little added risk in exchange for
significant benefit.

I am reminded of the early days of hobbyist computers (this was before
the term "personal computer" had been coined) when there was an
affection for "fully socketed" circuit cards.  This arose from the
risk of damaging an integrated circuit with too much heat in soldering
or maybe with a little static discharge zapping a delicate CMOS part,
and that chips were brandnew and not always very reliable.  The
comfort was knowing that if a chip failed it would be easy to unplug
it and plug in a replacement.  The problem, however, was that the
sockets were less reliable than the ICs that got plugged into them.
Solder could flow up into them and jam them, but mostly they would
corrode in time and cause a malfunction.  These days sockets are very
rare and a mostly reserved for ROMs that might need to be replaced in
a firmware upgrade--or ROMs that are only there sometimes, as an
option.  (I have an ethernet card with just such an empty socket on
it...)

I am also reminded of two motherboards we had fail at work recently.
In one case there were two electrolytic capacitors blown, and in the
other case one of those same capacitors had blown; these components
were part of supplying power to the computer.  These computers were
largely empty boxes, but would failures like these be more likely in a
box with more parts drawing more power and producing more heat?
Possibly.  

Similar to sockets, a singleboard computer doesn't let you replace
major components, but by not having a everything on separate plug in
cards the whole computer is more reliable.  My basement server doesn't
even have an ethernet card--its ethernet is on the motherboard (as is
just about everything with this MS-6340 motherboard).  Yes, I could
have a failure, but a new board is less than $70, I can afford to have
a spare at that rate.  There would be downtime to install the spare,
but by keeping things simple the chance of my needing it is lower.

I am not saying not to put needed parts in your machine, but if you
are putting extra parts in for reliability think through whether it
will actually *be* a net improvement in reliability.  If you search
the archives of Peter G. Neumann's "Risks Digest" you can find really
scary instances of people being stupid about how they install UPSs and
causing making things less secure.

The greatest risk to computer reliability is complexity.  That's why,
when I crafted a little cron job to let me know whether I have had a
raid disk failure, I opted for simple, simple, simple: A reference
copy of /proc/mdstat and an ultra simple script that diffs it with the
current /proc/mdstat and e-mails me the difference if there is any
difference.  This means that for as long as I have a dead disk I will
get an e-mail every morning, it doesn't try to track state changes; it
means that I have to manually update the reference copy when the
status changes intentionally, but it is simple, simple, simple.  Very
likely to work and very unlikely to break something else.


These reliability questions are tricky things to get right, I figured
I should raise the question.


-kb



-- 
redhat-list mailing list
unsubscribe mailto:[EMAIL PROTECTED]?subject=unsubscribe
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to