On 8/16/2018 7:14 PM, Michael Hu (CMBU) wrote:
Environment:
* solr 7.4.1
* all cores are vanilla cores with "loadOnStartUp" set to false, and
"transient" set to true
* we have about 75 cores with "transientCacheSize" set to 32
Issue: we have core corruption from time to time (2-3 core corruption a day)
How to reproduce:
* Set the "transientCacheSize" to 1
* Ingest high load to core1 only (no issue at this time)
* Continue ingest high load to core1 and start ingest load to core2
simultaneously (core2 immediately corrupted) (stack trace is attached below)
If a core gets unloaded while you're sending data to it, operation is
probably unpredictable. Core corruption isn't good, but I'm not
surprised that it happens in this scenario.
Your transientCacheSize must allow all cores which are getting updates
to be in memory at the same time, so unless that's all of your cores,
the number should probably be larger than the number of cores getting
updates, so you can query other cores simultaneously.
Thanks,
Shawn