Patrick Geoffray wrote: > Alas, people use blocking calls in general because > they are lazy (50%), they don't know (40%) or they don't care (10%).
We did some tests with non-blocking v.s. blocking. Unfortunately in our code there is only a small window of overlap, i.e. almost immediately after one computes a result and swaps it to the neighbouring processor the value received from the neighbour is needed. On small cases non-blocking was faster than blocking, on larger cases blocking definitely was advantageous. Maybe the MPI code used (LAM) does not handle multiple outstanding sends/receives well, maybe the network card does not like it, maybe it causes collisions at the destination processor. Anybody can comment on this? (we got best performance by scheduling the communication so it happens in pairs. Every processor swaps its data with one of its neighbours (we're using domain decomposition), then goes and swaps to a different neighbour. This schedule lasts until every processor has swapped with all its neighbours. The schedule is determined at the start of the run since the decomposition does not change) Mattijs -- Mattijs Janssens OpenCFD Ltd. The Mews, Picketts Lodge, Picketts Lane, Salfords, Surrey RH1 5RG. Tel: +44 (0)1293 821272 Email: [EMAIL PROTECTED] URL: http://www.OpenCFD.co.uk _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf