> On Jun 7, 2018, at 1:04 PM, Ryan Novosielski <novos...@rutgers.edu> wrote: > >> On Jun 7, 2018, at 9:43 AM, Peter Kjellström <c...@nsc.liu.se> wrote: >> >> On Thu, 7 Jun 2018 03:12:43 +0000 >> Ryan Novosielski <novos...@rutgers.edu> wrote: >> >>> One slight correction: 100% of our switches with FRU PN 00WE097/PN >>> 00WE096Y manufactured on 2016-11-28 (quantity 3) have failed, and one >>> same FRU PN/PN manufactured on 2016-12-15 too. We have another switch >>> with FRU PN 00WE093/PN 00WE092Y that was manufactured on 2016-11-28 >>> that has so far been OK, but I’m now suspicious of it. >> >> Thanks for the heads up. >> >> To make this data point more valuable, can you add total numbers? That >> is, how many (similar) switches in total, how many bad/good. And for >> how long did they run before exhibiting the problem. > > Sure, Peter. > > We only have 6 SB7890 switches currently. All were purchased through Lenovo, > and all have Lenovo machine types of 0724-HD6. I don’t think this has much to > do with Lenovo, though, apart from reselling them. One of the four that > failed is actually a replacement for a physically damaged switch (bad port > latch), so that means there is even bad replacement inventory out there. All > of the 4 aforementioned 00WE096Y switches have failed, 3 manufactured on > 2018-11-28 and 1 manufactured on 2016-12-15. I don’t have an exact date for > the first failure or the switch installation, but the failure occurred > roughly a year after the manufacturing date. My guess is that they were in > service for about 9-10 months, but I can probably narrow that down with a > little more effort if it matters. > > The other two are FRU PN 00WE093/PN 00WE92Y (Lenovo MT 0724-HD5). So far so > good on those, though I’m now suspicious of the one manufactured on > 2016-11-28. > > Additionally, we have two SB7800 switches — FRU PN: 00WE085/PN 00WE084Y. Too > new to tell on those — only a few weeks in service. Both were manufactured on > 2018-01-08.
Upshot is an advance RMA on anything that has already shown symptoms (so 3 switches for us), and an on-site visit with a software fix to be applied to all other SB7800-class switches; they seem to think all are potentially affected. The sort of timeframe they gave is ~45 minutes of work on our 11 units. Fun. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf