On Thu, May 20, 2010 at 08:17:48PM +0200, Henning Brauer wrote: > > I have two identical "core" switches in one (not really so critical at > > all) place running OSPF, with a bunch of routers connecting to both > > switches for redundancy. Works pretty well and there has even been a > > config reset incident, which didn't break anything - because OSPF can > > detect link failures. Trying to do the same all the way to the end hosts > > (i.e. without a routing protocol) is pretty difficult. > > i would never ever run any L3 on switches. Bad wording on my part, the routers run OSPF and the switches are dumb L2 devices.
Still, without OSPF et al there would be no way to detect a crappy switch failing in funny ways, which was my point. As an extra note, if you do get a crappy switch, be very careful with its management interface. The cheapest ones have unbelievably slow CPUs that are easily overloaded by broadcasts making the whole thing stop responding. Even worse, the interrupt load seems to trigger some other bugs, like LACP mysteriously failing and disabling one port on a trunk and blackholing half of your traffic (this happened on a ZyXEL GS-4024, which has otherwise totally Just Worked as a L2 switch for years) or even the whole switch ASIC "crashing" after a broadcast storm and requiring a reboot (though the management CPU was still responding through the out of band ether and serial port after the storm was gone) Also, it's a very obvious DoS; a malicious person needs to send a rather small amount of BPDUs to overload the tiny CPU and the cheap switches obviously have no rate limiting for packets going to the CPU (only on all broadcasts). So, blocking BPDUs from non-trusted devices should be enabled (but that should probably be done anyway.) Even among "trusted" devices STP and LACP involve the shitty code running on the underpowered management CPU, and that is not the part that shines in the cheap switches. Static link aggregation works OK.

