Hi, We had some time ago (sept/oct 2018) a long discussion where I was suspecting a scheduler issue (subject "rtems_message_queue_receive/rtems_event_receive issues")
We got to the point where I realized that _Chain_Append_unprotected might fail to add an element in the queue, with the effect of having a task in a funny state where state=READY, but the task will not be in the ready chain, so the task will never get CPU time anymore since a task needs to be blocked in order to be unblocked when new data arrives. We were using USB then, but this issue re-became hot because we just got the same issue over serial :) I believe there is a possible chain of events that can make _Chain_Append_unprotected to fail, explanations follow. /* ** @note It does NOT disable interrupts to ensure the atomicity of the* ** append operation.* */ RTEMS_INLINE_ROUTINE void _Chain_Append_unprotected( Chain_Control *the_chain, Chain_Node *the_node ) { Chain_Node *tail = _Chain_Tail( the_chain ); Chain_Node *old_last = tail->previous; the_node->next = tail; * tail->previous = the_node;* * old_last->next = the_node;* the_node->previous = old_last; } The * tail->previous = the_node;* * old_last->next = the_node;* lines are the ones that actually add the element to the ready chain. If a thread executes those lines, but just before executing the_node->previous = old_last; another thread comes to add another node in this chain, it will set another node in tail->previous and old_last->next, and as a result, when the interrupted thread will continue to execute the last line, it will be for nothing, because the initial node will not be added to the ready chain. If this chain of events occur (*and after a while they will*), we get starvation for that task. I'm reproducing this issue in a long duration test, the duration before this happens varies from run to run, but it always happens. *What I'm proposing is the following*: call _Chain_Append instead of _Chain_Append_unprotected in schedulerpriorityimpl.h, _Scheduler_priority_Ready_queue_enqueue function. void _Chain_Append( Chain_Control *the_chain, Chain_Node *node ) { ISR_Level level; _ISR_Disable( level ); _Chain_Append_unprotected( the_chain, node ); _ISR_Enable( level ); } This way the add-element-to-chain operation becomes atomic. I was able to run a long duration test (8 hrs) in my setup with this fix successfully. What do you think ? regards, Catalin
_______________________________________________ users mailing list users@rtems.org http://lists.rtems.org/mailman/listinfo/users