Hi list,

after some strange search results I was trying to locate the problem
and it turned out that it starts with bulk loading with SolrJ
and ConcurrentUpdateSolrClient.Builder with several threads.

I assume that ConcurrentUpdateSolrClient.Builder is _NOT_ thread safe
according the docs send to the indexer?

It feels like documents with the same doc_id are not always indexed
in the order they are sent to the indexer. It is some kind of random generator.

Example:
file LR00010.xml
<doc>
  <str name="id">my_uniq_id_1234</str>
  <date name="date">2017-03-28T23:21:40Z</date>
  ...

file LR01000.xml
<doc>
  <str name="id">my_uniq_id_1234</str>
  <date name="date">2017-04-26T00:42:10Z</date>
  ...


The files are in the same subdir.
They are loaded, processed, and send to the indexer in ascending natural order.
LR00010.xml is handled way before LR01000.xml.

But the result is that sometimes the older doc of LR00010.xml is in the index
and the newer doc from LR01000.xml is marked as deleted, and sometimes the
newer doc of LR01000.xml is in the index and the older doc from LR00010.xml
is marked as deleted.

Anyone seens this?

I could try ConcurrentUpdateSolrClient.Builder with only one thread and
see if the problem still exists.

Regards
Bernd


Reply via email to