GSharayu opened a new pull request #6990:
URL: https://github.com/apache/incubator-pinot/pull/6990


   The refresh thread maintains a single global queue of realtime text index 
readers across all consuming segments across all tables.
   
   A reader is polled, refreshed and supposed to be added back to the queue so 
that it can be refreshed again in the next cycle.
   There is a race condition: If we find the queue empty, but something gets 
added before we lock the mutex, the thread signal will be lost and it will be 
in await mode until next segment signals.
   
   Set of steps for race condition
   
   1. Thread finds queue empty and enters the while check on 
https://github.com/apache/incubator-pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneIndexReaderRefreshThread.java#L69
   2. Meanwhile another thread grabs lock at 
https://github.com/apache/incubator-pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneIndexRefreshState.java#L86
   3. The current thread adds the segment in queue and signals the condition 
variable (right now its not hold)
   4. The lock after released from step 3 will be grabbed by thread in step 1
   It will then await for signal on condition variable(Which is bug!) and might 
have to wait until some other segment signals it.
   
   
   here are several ways to solve the above problem
   
   1. One easy way to fix this is to check again if the queue is empty right 
before the condition variable awaits.
   2. Another similar way is to use a variable and set it during the segment 
addition and a similar if condition check(as 1) for the variable before 
condition variable awaits.
   3. The approach used in above PR
        i. When the thread enters 
https://github.com/apache/incubator-pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneIndexReaderRefreshThread.java#L68,
 it will try to acquire the lock.
        ii. If it does acquire the lock, it will check if the queue is empty or 
not. Depending on state of queue, it will either skip the while loop or enter 
the while loop and await for the signal (and release the lock while await())
        iii. If the thread at 
https://github.com/apache/incubator-pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneIndexRefreshState.java#L86
 acquires lock first, then it will add the segment to queue and signal any 
condition variable if it awaits.
        iv. In step 3, after segment is added to queue, segment thread will 
release the lock. and when background refresh thread now tries to acquire the 
lock, gets the lock. It then will check for while condition(if the queue is 
empty) which will be false after step 3 and never wait. In case any problem 
occurs within while loop, it being in try catch{} block, the lock acquired by 
this background refresh thread will be released in finally{}.
   
   Essentially, the mutex ensures that only a single(background refresh 
thread/segment thread) thread updates/check the queue at any given moment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to