siddharthteotia commented on code in PR #11558: URL: https://github.com/apache/pinot/pull/11558#discussion_r1325017569
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java: ########## @@ -109,27 +112,43 @@ public ImmutableRoaringBitmap getDictIds(String searchQuery) { @Override public MutableRoaringBitmap getDocIds(String searchQuery) { MutableRoaringBitmap docIDs = new MutableRoaringBitmap(); - Collector docIDCollector = new RealtimeLuceneDocIdCollector(docIDs); - IndexSearcher indexSearcher = null; - try { - Query query = new QueryParser(_column, _analyzer).parse(searchQuery); - indexSearcher = _searcherManager.acquire(); - indexSearcher.search(query, docIDCollector); - return getPinotDocIds(indexSearcher, docIDs); - } catch (Exception e) { - LOGGER - .error("Failed while searching the realtime text index for column {}, search query {}, exception {}", _column, - searchQuery, e.getMessage()); - throw new RuntimeException(e); - } finally { + RealtimeLuceneDocIdCollector docIDCollector = new RealtimeLuceneDocIdCollector(docIDs); + // A thread interrupt during indexSearcher.search() can break the underlying FSDirectory used by the IndexWriter + // which the SearcherManager is created with. To ensure the index is never corrupted the search is executed + // in a child thread and the interrupt is handled in the current thread by canceling the search gracefully. + // See https://github.com/apache/lucene/issues/3315 and https://github.com/apache/lucene/issues/9309 + Callable<MutableRoaringBitmap> searchCallable = () -> { + IndexSearcher indexSearcher = null; try { - if (indexSearcher != null) { - _searcherManager.release(indexSearcher); + Query query = new QueryParser(_column, _analyzer).parse(searchQuery); + indexSearcher = _searcherManager.acquire(); + indexSearcher.search(query, docIDCollector); + return getPinotDocIds(indexSearcher, docIDs); + } finally { + try { + if (indexSearcher != null) { + _searcherManager.release(indexSearcher); + } + } catch (Exception e) { + LOGGER.error( + "Failed while releasing the searcher manager for realtime text index for column {}, exception {}", + _column, e.getMessage()); } - } catch (Exception e) { - LOGGER.error("Failed while releasing the searcher manager for realtime text index for column {}, exception {}", - _column, e.getMessage()); } + }; + Future<MutableRoaringBitmap> searchFuture = + SEARCHER_POOL.getExecutorService().submit(searchCallable); + try { + return searchFuture.get(); + } catch (InterruptedException e) { + docIDCollector.markShouldCancel(); + LOGGER.warn("Lucene query timed out while searching the realtime text index for segment {}, column {}," + + " search query {}", _segmentName, _column, searchQuery); + throw new RuntimeException("Lucene query was cancelled after timeout was reached"); Review Comment: No what I meant is that stick to convention of the error. IIRC, the scenarios in which we actually propagate the end user error that their query was cancelled/killed, we do include that query was cancelled due to lack of memory resources / overshooting heap. @jasperjiaguo please correct me if I am wrong. I think the other usage of "cancelled" terminology is for external cancelling via requestID, I think this code path is hit when the search timesOut. Correct ? If so, it seems like while this should be a regular TimeoutException (or at least folded that way under RuntimeException), my suggestion was to see if we can refrain from calling it cancelled for a pure TimeOut error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org