Jackie-Jiang commented on PR #8636:
URL: https://github.com/apache/pinot/pull/8636#issuecomment-1122597936

   > Regards to the Analyser, I looked at the core Lucene class and also did 
some benchmarking -- due to the fact that only one thread is writing to a 
single mutable FST at any given point of time, I was not able to see any 
significant overhead due to the analyser refresh. However, I have attuned the 
code now per your comments.
   
   We can save one analyser object per document, which has an underlying thread 
local store and needs to be closed/released after tokenizing the document. If 
we ingest at high rate, we should be able to see it (at least it is guaranteed 
to reduce garbage if we reuse it)
   
   > Also, regards to the exceptions, I have been actively trying to limit the 
exceptions raised to the ones which are significant to the index -- for eg, I 
do not feel its worth failing the test if a thread is interrupted, or the 
Lucene SearcherManager failed to refresh. However, again, I have attuned the 
code now per your comments.
   
   We might want to live with the exception in production code by logging an 
warning/error and keep the ingestion going, but in the tests we want to catch 
as much exception as possible as long as they are unexpected so that we can 
find bugs in the production code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to