On Friday 13 May 2011 11:50:57 Max Laier wrote:
> On Friday 13 May 2011 11:28:33 Andriy Gapon wrote:
> > on 13/05/2011 17:41 Max Laier said the following:
> > > this ncpus isn't the one you are looking for.
> > 
> > Thank you!
> 
> > Here's an updated patch:
> Can you attach the patch, so I can apply it locally.  This code is really
> hard to read without context.  Some more comments inline ...
> 
> > Index: sys/kern/subr_smp.c
> > ===================================================================
> > --- sys/kern/subr_smp.c     (revision 221835)
> > +++ sys/kern/subr_smp.c     (working copy)
> > @@ -316,19 +316,14 @@
> > 
> >     void (*local_action_func)(void*)   = smp_rv_action_func;
> >     void (*local_teardown_func)(void*) = smp_rv_teardown_func;
> > 
> > -   /* Ensure we have up-to-date values. */
> > -   atomic_add_acq_int(&smp_rv_waiters[0], 1);
> > -   while (smp_rv_waiters[0] < smp_rv_ncpus)
> > -           cpu_spinwait();
> > -
> 
> You really need this for architectures that need the memory barrier to
> ensure consistency.  We also need to move the reads of smp_rv_* below this
> point to provide a consistent view.
> 
> >     /* setup function */
> >     if (local_setup_func != smp_no_rendevous_barrier) {
> >     
> >             if (smp_rv_setup_func != NULL)
> >             
> >                     smp_rv_setup_func(smp_rv_func_arg);
> >             
> >             /* spin on entry rendezvous */
> > 
> > -           atomic_add_int(&smp_rv_waiters[1], 1);
> > -           while (smp_rv_waiters[1] < smp_rv_ncpus)
> > +           atomic_add_int(&smp_rv_waiters[0], 1);
> > +           while (smp_rv_waiters[0] < smp_rv_ncpus)
> > 
> >                     cpu_spinwait();
> >     
> >     }
> > 
> > @@ -337,12 +332,16 @@
> > 
> >             local_action_func(local_func_arg);
> >     
> >     /* spin on exit rendezvous */
> > 
> > -   atomic_add_int(&smp_rv_waiters[2], 1);
> > -   if (local_teardown_func == smp_no_rendevous_barrier)
> > +   atomic_add_int(&smp_rv_waiters[1], 1);
> > +   if (local_teardown_func == smp_no_rendevous_barrier) {
> > +           atomic_add_int(&smp_rv_waiters[2], 1);
> > 
> >                  return;
> > 
> > -   while (smp_rv_waiters[2] < smp_rv_ncpus)
> > +   }
> > +   while (smp_rv_waiters[1] < smp_rv_ncpus)
> > 
> >             cpu_spinwait();
> > 
> > +   atomic_add_int(&smp_rv_waiters[2], 1);
> > +
> > 
> >     /* teardown function */
> >     if (local_teardown_func != NULL)
> >     
> >             local_teardown_func(local_func_arg);
> > 
> > @@ -377,6 +376,10 @@
> > 
> >     /* obtain rendezvous lock */
> >     mtx_lock_spin(&smp_ipi_mtx);
> > 
> > +   /* Wait for any previous unwaited rendezvous to finish. */
> > +   while (atomic_load_acq_int(&smp_rv_waiters[2]) < smp_rv_ncpus)
> > +           cpu_spinwait();
> > +

Disregard this ... I misread the diff.  You are indeed using [2] correctly as 
the "all-clear" semaphore.  I still believe, that it is safer/cleaner to do 
this spin before releasing the lock instead (see my patch).

> This does not help you at all.  Imagine the following (unlikely, but not
> impossible) case:
> 
> CPUA: start rendevouz including self, finish the action first (i.e. CPUA is
> the first one to see smp_rv_waiters[2] == smp_rv_ncpus, drop the lock and
> start a new rendevouz.  smp_rv_waiters[2] == smp_rv_ncpus is still true on
> that CPU, but ...
> 
> CPUB might have increased smp_rv_waiters[2] for the first rendevouz, but
> never saw smp_rv_waiters[2] == smp_rv_ncpus, still ...
> 
> CPUA is allowed to start a new rendevouz which will leave CPUB stranded and
> can lead to a deadlock.
> 
> I think this is also possible with another CPU starting the second
> rendevous.
> 
> >     /* set static function pointers */
> >     smp_rv_ncpus = ncpus;
> >     smp_rv_setup_func = setup_func;
> > 
> > @@ -395,7 +398,7 @@
> > 
> >             smp_rendezvous_action();
> >     
> >     if (teardown_func == smp_no_rendevous_barrier)
> > 
> > -           while (atomic_load_acq_int(&smp_rv_waiters[2]) < ncpus)
> > +           while (atomic_load_acq_int(&smp_rv_waiters[1]) < ncpus)
> > 
> >                     cpu_spinwait();
> >     
> >     /* release lock */
> > 
> > _______________________________________________
> > freebsd-current@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to
> > "freebsd-current-unsubscr...@freebsd.org"
> 
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to