On 7/19/21, 9:12 AM, "Beowulf on behalf of Prentice Bisbal via Beowulf" <beowulf-boun...@beowulf.org on behalf of beowulf@beowulf.org> wrote:
Doug, <snip> I know they there is a direct relationship between system failure and operating temperature, but I don't know if that applies to all components, or just those with moving parts. Someone somewhere must have done research on this. I know Google did research on hard drive failure that was pretty popular. I would imagine they would have researched this, too. In general, it follows the Arrhenius relationship with some TBD exponent. 10C rise ages twice as fast is a common rule of thumb. There's all sorts of background physics to this - drift of metallization and doping , radiation accumulation, etc.,etc. Cycling is a different failure mechanism, and there it's propagation of microscopic defects with each cycle, as well as the more obvious "cracks in solder/PWB trace" kind of thing. One of the big issues today is the difference in CTE between the chips (or their packages) and the PWB. Column and Grid arrays that are soldered in have an issue with the corner pins/balls/columns being stressed more than the sides, and any time you have cyclic stress, you have the prospect of work hardening and micro crack propagation. Sockets with interposers do help with this, because they allow changing misalignment without failure. OTOH, now you have a socket and interposer, which can fail. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf