kotman12 commented on PR #11955:
URL: https://github.com/apache/lucene/pull/11955#issuecomment-1322093451

   > > > Does this library also check for race conditions that can arise 
between ResourceLoaderAware::inform vs TokenStream creation and processing? I 
know it may be out of the scope of this change but I would be curious to know..
   > > 
   > > 
   > > Specific to this, I think one potential plan: we could refactor the 
tests more to check for it. Existing tests are using `BaseTokenStreamTestCase` 
but we could also test factories with 
https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/analysis/BaseTokenStreamFactoryTestCase.java
   > > And maybe we could add evil stuff to this 
`BaseTokenStreamFactoryTestCase` to root out any factory-specific thread 
hazards across all of our factories (including opennlp).
   > 
   > Hi, there should not be any race conitions between TokenStreamFactory's 
constructor, `inform()` and creation of token streams. For legacy reasons with 
Apache Solr there is still the split between constructor and inform(), but 
acatually, the factory should initialize itsself completely in constructor and 
all fields should be final. I would fix this with Lucene 10 at some point by 
removing the ResourceLoaderAware interface and just allow the factory to have a 
ResourceLoader (optinally, only if needed) passed next to the map in ctor. I 
have some plans to do this and I would also fix Solr later. My plan is to allow 
to declare a fcatory to have a ctor with `ResourceLoader`if it needs it. The 
SPI code would look for both constructors and call the right one.
   > 
   > At moment this is not problem, because the factories are always created 
without races: ctor is called, followed by the inform (in one thread). After 
that the instance of factory is ready to be used and can be used by multiple 
threads. Any code violating this fails soon, because the code won't find its 
resources.
   
   Thanks for the clarification, that does make things simpler to analyze. I 
still think there is a race condition between parallel calls to 
`FilterFactory:create` mainly because of unsynchronized, lazy initialization of 
singleton members within open-nlp that actually gets invoked from lucene's 
factory::create method (surprisingly not from the factory cctor/inform). Most 
of the cases I saw would probably be hard to pin down by a test but I think 
I'll be able to illustrate at least one of them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to