Hi all,

I think I might have discovered a synchronization bug when ingesting a lot
of data into Solr, but want to check with the specialists first ;-)

I'm using a little custom written map/reduce framework that boots a
20-something threads to do some heavy processing on data-preparation. When
this processing is done, the results of these threads are gathers in a
reduce step, where they are ingested into an (embedded) Solr instance. To
maximize throughput, I'm ingesting the data in parallel in a couple of
threads of their own and this is where I run into a synchronization error.

As with all synchronization bugs, it happens "some" of the time and they're
hard to debug, but I think I managed to get my finger on the root (I'm
using Solr 8.3):

in class org.apache.lucene.index.CodecReader, throws a NPE on line 84:
getFieldsReader().visitDocument(docID, visitor);

The issue is that the getFieldsReader() getter is mapped to a ThreadLocal
(more explicitly,
org.apache.lucene.index.SegmentCoreReaders.fieldsReaderLocal) that seems to
be released (set to null) somewhere automatically, and read afterwards,
without synchronizing the two.

I don't think I should set any resource locks of my own, since I'm only
using the SolrJ API and the /update endpoint.

I know this is quite a low-level question, but could anyone point me in the
right direction to further investigate this issue? Ie, what could be the
reason the reader is released out-of-sync?

best,

b.

Reply via email to