Jackie-Jiang commented on PR #8636: URL: https://github.com/apache/pinot/pull/8636#issuecomment-1122597936
> Regards to the Analyser, I looked at the core Lucene class and also did some benchmarking -- due to the fact that only one thread is writing to a single mutable FST at any given point of time, I was not able to see any significant overhead due to the analyser refresh. However, I have attuned the code now per your comments. We can save one analyser object per document, which has an underlying thread local store and needs to be closed/released after tokenizing the document. If we ingest at high rate, we should be able to see it (at least it is guaranteed to reduce garbage if we reuse it) > Also, regards to the exceptions, I have been actively trying to limit the exceptions raised to the ones which are significant to the index -- for eg, I do not feel its worth failing the test if a thread is interrupted, or the Lucene SearcherManager failed to refresh. However, again, I have attuned the code now per your comments. We might want to live with the exception in production code by logging an warning/error and keep the ingestion going, but in the tests we want to catch as much exception as possible as long as they are unexpected so that we can find bugs in the production code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org