kotman12 commented on PR #11955: URL: https://github.com/apache/lucene/pull/11955#issuecomment-1322093451
> > > Does this library also check for race conditions that can arise between ResourceLoaderAware::inform vs TokenStream creation and processing? I know it may be out of the scope of this change but I would be curious to know.. > > > > > > Specific to this, I think one potential plan: we could refactor the tests more to check for it. Existing tests are using `BaseTokenStreamTestCase` but we could also test factories with https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/analysis/BaseTokenStreamFactoryTestCase.java > > And maybe we could add evil stuff to this `BaseTokenStreamFactoryTestCase` to root out any factory-specific thread hazards across all of our factories (including opennlp). > > Hi, there should not be any race conitions between TokenStreamFactory's constructor, `inform()` and creation of token streams. For legacy reasons with Apache Solr there is still the split between constructor and inform(), but acatually, the factory should initialize itsself completely in constructor and all fields should be final. I would fix this with Lucene 10 at some point by removing the ResourceLoaderAware interface and just allow the factory to have a ResourceLoader (optinally, only if needed) passed next to the map in ctor. I have some plans to do this and I would also fix Solr later. My plan is to allow to declare a fcatory to have a ctor with `ResourceLoader`if it needs it. The SPI code would look for both constructors and call the right one. > > At moment this is not problem, because the factories are always created without races: ctor is called, followed by the inform (in one thread). After that the instance of factory is ready to be used and can be used by multiple threads. Any code violating this fails soon, because the code won't find its resources. Thanks for the clarification, that does make things simpler to analyze. I still think there is a race condition between parallel calls to `FilterFactory:create` mainly because of unsynchronized, lazy initialization of singleton members within open-nlp that actually gets invoked from lucene's factory::create method (surprisingly not from the factory cctor/inform). Most of the cases I saw would probably be hard to pin down by a test but I think I'll be able to illustrate at least one of them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org