I know the autoscaling framework does not exist anymore with Solr 9+, but I wanted to share here a bug we found in it. Probably there are still plenty of Solr 8 users still relying on this framework.
The triggers use timestamps returned by the JVM call System.nanoTime(), but according to the Javadoc, this is NOT an absolute timestamp. This is just a number relative to a random origin, and this origin will change each time the JVM is restarted. I figured out this impacts at least the following triggers (with basically the same pattern), - IndexSizeTrigger - MetricTrigger - SearchRateTrigger These triggers want to fire an event when a certain condition (depending on each trigger) is met for a certain period of time. They maintain a map with [what, timestamp] entries to track a short term history, with the option to remove an entry if the condition is not met anymore, so we don't trigger any event. Timestamps come from System.nanoTime(). So far so good as long as we compare these timestamps to each others in the same JVM. Now, this map is persisted in Zookeeper in case of an overseer change (written and read by TriggerBase.saveState() and restoreState() ). With an overseer change, the nanoTime() origin is randomly moved to something else. Consequently, all the persisted timestamps from the previous overseer cannot be compared with the current JVM "clock". This ends in triggers never being fired, or being fired without waiting for the time configured. I found no Jira entry for this (but maybe there is one?), and I think this could be a major contributor to the instability of this framework for some environments. Also, I'm unsure whether it is still maintained in a 8x branch. Simple fix could be to always use TimeSource.getEpochTimeNs() instead of getTimeNs() in autoscaling code. But I'm not sure why we use nano seconds anyway. Seconds would be sufficient... Thanks, Pierre