Ma77Ball opened a new issue, #5325: URL: https://github.com/apache/texera/issues/5325
Each StoppableQueueBlockingRunnable thread (MainLoop, NetworkSender, PortStorageWriter) blocks in interruptible_get on a queue.get() with no timeout. The only way the thread leaves that wait is the RUNNABLE_STOP marker enqueued by stop(). This is fragile on shutdown: if that single wakeup is ever missed (e.g. a lost notification) or a stop is signalled by a path that does not enqueue the marker, the thread parks forever. python_worker.run() then blocks at thread.join() and the worker never finishes shutting down. Note this only affects shutdown. A quiet queue during normal operation is fine and expected; the threads are long-lived and idle between tuple bursts. Proposed fix (behavior-preserving for the data path): - Add a threading.Event stop flag; stop() sets it in addition to enqueueing the marker. - Make interruptible_get poll with a short timeout and re-check the flag on each wakeup (treating queue.Empty as "loop and wait again"), so a missed marker can no longer park the thread indefinitely. - Thread an optional timeout parameter through Getable.get -> InternalQueue.get -> LinkedBlockingMultiQueue.get (the last via Condition.wait_for, raising queue.Empty on expiry). Default timeout=None keeps the existing blocking behavior unchanged, so only the stoppable threads opt into polling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
