Yicong-Huang opened a new issue, #4524:
URL: https://github.com/apache/texera/issues/4524
## Summary
`core/runnables/test_main_loop.py::TestMainLoop::test_main_loop_thread_can_align_ecm`
is intermittently red on CI for unrelated PRs (observed on #4512 and #4520,
same assertion at `test_main_loop.py:1176`). Locally it always passes (100/100
with `-x` on macOS, 30/30 of the full file).
## Root cause
The test assumes that two items put into `output_queue` come out in FIFO
order:
```python
input_queue.put(ECMElement(tag=mock_control_input_channel,
payload=test_ecm)) # 1
input_queue.put(mock_binary_data_element)
# 2
input_queue.put(ECMElement(tag=mock_data_input_channel, payload=test_ecm))
# 3 -> aligns ECM, runs NoOperation
output_data_element: DataElement = output_queue.get() # expects data first
...
output_control_element: DCMElement = output_queue.get() # expects control
reply second
```
But `output_queue` is an `InternalQueue`, which is a
`LinkedBlockingMultiQueue` keyed by channel with **priority 1 for control
sub-queues and priority 2 for data sub-queues** (`internal_queue.py:80`).
`LinkedBlockingMultiQueue.get()` always pops the highest-priority enabled
sub-queue first.
In MainLoop the puts happen sequentially:
1. DataElement → data sub-queue (priority 2)
2. NoOperation reply DCMElement → control sub-queue (priority 1)
On a fast machine, the test calls `.get()` after step 1 but before step 2,
so only the data is in the queue — it comes out first, the test passes. On a
slow CI runner, MainLoop reaches step 2 before the test calls `.get()` — both
items are queued, the priority queue returns the control reply first, and the
assertion at line 1176 fails with:
```
AssertionError: ChannelIdentity(..., to='sender', is_control=True) # actual
(control)
!= ChannelIdentity(..., to='dummy_worker_id', is_control=False)
# expected (data)
```
The production behavior is correct — control should win priority over data
on the egress queue.
## Proposed fix
Make the test order-tolerant: drain two items from `output_queue`, identify
each by type (`DataElement` vs `DCMElement`), and assert each independently. No
production code change.
## Priority
P2 – Medium (blocks unrelated PRs every few CI runs)
## Task Type
- [x] Testing / QA
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]