Dear Patrick,
Thanks for your explanations. 
Being the final users, I means users do the computation, the problems of us 
occurred in the six month of Dawning 4000A. Actually, we can only talk to the 
SSC and they discussed with Dawning guys, and maybe they would call you. I had 
to say the Dawning guys lacks capability to support the machine.
The main problem is one of nodes we used would disconnected from sub-machine we 
were using and I have tried a few interfaces like MPICH, Lam-MPI and the 
problem still existed.
I have heard the Myrinet in Barcelona works well. But in the SSC, it 
disappointed me and many users. Even the SSC, it was disappointed by the 
Dawning machines. 
Sorry again for the single case. 
Regards,
Li, Bo
----- Original Message ----- 
From: "Patrick Geoffray" <[EMAIL PROTECTED]>
To: "Li, Bo" <[EMAIL PROTECTED]>
Cc: "Mark Hahn" <[EMAIL PROTECTED]>; "?? ?" <[EMAIL PROTECTED]>; "Beowulf 
Mailing List" <beowulf@beowulf.org>
Sent: Wednesday, November 21, 2007 8:34 AM
Subject: Re: [Beowulf] help on building Beowulf


> Hi Bo,
> 
> Li, Bo wrote:
>> According to my experiences to run HPC applications in Shanghai Super 
>> Computing Center. Myrinet interconnection brought to many failure with even 
>> a small application. All users are crazy with the interconnections and we 
>> had to restart the applications once and once again. I am not sure if there 
>> were improvement when Myrinet involved. During my staying there for three 
>> months, nothing done by the Myrinet when guys from Dawning called them for 
>> help. Sorry, if I put too many private opinions on the case.
> 
> I have looked at all of the 46 Help Tickets opened by Dawning with 
> Myricom Tech Support between 2004 and 2007, and all of them were first 
> handled under 2 business days. Final resolution varied from a few hours 
> to one week (RMA of switch enclosure).
> 
> Doing a cross-reference with Shanghai Supercomputing Center 
> (Dawning4000A cluster), I saw the same software problem reported 
> multiple times over a several months period. It was answered each time 
> the following day. The reported problem was MPICH-GM unable to open a GM 
> port (which could have many causes but a common one was MPI jobs 
> terminating abnormally and not being cleaned up properly). We were not 
> made aware of continuing problems after relevant information was sent. 
> Further tickets referred to performance tuning, not operational stability.
> 
> When exactly did you experience the problems on this machine ?
> 
> We do our best to support our customers. Sometimes, communication is 
> hard due to language barriers and lack of steady contact. Other times, 
> problems do not reach us because integrators/customers try to fix them 
> internally. This is not perfect, but we tend to fix things that are broken.
> 
> Patrick
> -- 
> Patrick Geoffray
> Myricom, Inc.
> http://www.myri.com

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to