Hi, My device just passed a 24h long duration test, I can say now that this issue is history. Since we were looking at this for quite a while, I would like to thank you, Sebastian for your prompt & useful support !
regards, Catalin On Fri, Mar 29, 2019 at 11:56 AM Catalin Demergian <demerg...@gmail.com> wrote: > Hi, > We had some time ago (sept/oct 2018) a long discussion where I was > suspecting a > scheduler issue (subject "rtems_message_queue_receive/rtems_event_receive > issues") > > We got to the point where I realized that _Chain_Append_unprotected might > fail to add an > element in the queue, with the effect of having a task in a funny state > where state=READY, but > the task will not be in the ready chain, so the task will never get CPU > time anymore since a task > needs to be blocked in order to be unblocked when new data arrives. > > We were using USB then, but this issue re-became hot because we just got > the same issue > over serial :) > I believe there is a possible chain of events that can make > _Chain_Append_unprotected to fail, > explanations follow. > > /* > > ** @note It does NOT disable interrupts to ensure the atomicity of the* > > ** append operation.* > > */ > > > > RTEMS_INLINE_ROUTINE void _Chain_Append_unprotected( > > Chain_Control *the_chain, > > Chain_Node *the_node > > ) > > { > > Chain_Node *tail = _Chain_Tail( the_chain ); > > Chain_Node *old_last = tail->previous; > > > > the_node->next = tail; > > * tail->previous = the_node;* > > * old_last->next = the_node;* > > the_node->previous = old_last; > > } > > The > > * tail->previous = the_node;* > > * old_last->next = the_node;* > > lines are the ones that actually add the element > > to the ready chain. > > If a thread executes those lines, but just before executing > > the_node->previous = old_last; > > another thread comes to add another node in this chain, it will set > another node in > > tail->previous and old_last->next, and as a result, when the interrupted > > thread will continue to execute the last line, it will be for nothing, > because the initial node will not be added to the ready chain. > > > If this chain of events occur (*and after a while they will*), we get > starvation for that task. > > I'm reproducing this issue in a long duration test, the duration before > this happens varies from run to run, but it always happens. > > > *What I'm proposing is the following*: call _Chain_Append instead of > _Chain_Append_unprotected in > schedulerpriorityimpl.h, _Scheduler_priority_Ready_queue_enqueue function. > > > void _Chain_Append( > > Chain_Control *the_chain, > > Chain_Node *node > > ) > > { > > ISR_Level level; > > > > _ISR_Disable( level ); > > _Chain_Append_unprotected( the_chain, node ); > > _ISR_Enable( level ); > > } > > > This way the add-element-to-chain operation becomes atomic. > > I was able to run a long duration test (8 hrs) in my setup with this fix > successfully. > > > What do you think ? > > > regards, > Catalin > >
_______________________________________________ users mailing list users@rtems.org http://lists.rtems.org/mailman/listinfo/users