It should not be required that your transient core size is greater than or equal to the number of simultaneous updates.
Theoretically, it works like this: - A request comes in and a reference-counted core is opened to serve it. That may require loading the core. - If another request comes in that bumps this core out, that core should still be active until the current request is done. - Once the request is done, the reference count is decremented and it's closed So theoretically (I love that word) even though you have your transient cache size set to 1 you can have N open transient cores, all pending closure. That said, I don't think there is a test case that deals with this explicitly. The problem here is that you may have M requests queued up for the _same_ core, each with a new update request. So theory aside, Shawn's comment is very likely a way to get around this. The model for transient cores is that a core is opened, used for a while then thrown away, it wasn't built with the idea of rapidly updating a single transient core so I can certainly believe that that's a problem. TestLazyCores.java has a multi-threaded test for a race condition, it should be possible to write a test case for the above. Best, Erick On Wed, Aug 22, 2018 at 9:19 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 8/16/2018 7:14 PM, Michael Hu (CMBU) wrote: >> >> Environment: >> >> * solr 7.4.1 >> * all cores are vanilla cores with "loadOnStartUp" set to false, and >> "transient" set to true >> * we have about 75 cores with "transientCacheSize" set to 32 >> >> Issue: we have core corruption from time to time (2-3 core corruption a >> day) >> >> How to reproduce: >> >> * Set the "transientCacheSize" to 1 >> * Ingest high load to core1 only (no issue at this time) >> * Continue ingest high load to core1 and start ingest load to core2 >> simultaneously (core2 immediately corrupted) (stack trace is attached below) > > > If a core gets unloaded while you're sending data to it, operation is > probably unpredictable. Core corruption isn't good, but I'm not surprised > that it happens in this scenario. > > Your transientCacheSize must allow all cores which are getting updates to be > in memory at the same time, so unless that's all of your cores, the number > should probably be larger than the number of cores getting updates, so you > can query other cores simultaneously. > > Thanks, > Shawn >