Hi list, We have encountered a weird bug related to the facet.offset parameter. In short: the more general query is, that generates lots of hits, the higher the risk of the facet.offset parameter to stop working.
In more detail: 1. Since getting all facets we need (facet.limit=1000) from around 100 shards didn't work for some broad query terms, like "the" (yes, we index and search those too), we decided to paginate. 2. The facet page size is set to 100 for all pages starting the second one. We start with: facet.offset=0&facet.limit=30, then continue with facet.offset=30&facet.limit=100, then facet.offset=100&facet.limit=100 and so on, until we get facet.offset=900. All facets work just fine, until we hit facet.offset=700. Debugging showed, that in the class HttpCommComponent static Executor instance is created with a setting to terminate idle threads after 5 sec. Our belief, is that this setting way too low for our billion document scenario and broad searches. Setting this to 5 min seems to improve the situation a bit, but not solve fully. This same class is no longer used in 4.2.1 (can anyone tell what's used instead in distributed faceting?) so it isn't easy to compare these parts of the code. Anyhow, playing now with this value in the hope to see some light in the tunnel (would be good, if it is not the train). One more question: can this be related to RAM allocation on the router and / or shards? If RAM isn't enough for some operations, why the router or shards wouldn't just crash with OOM? If anyone has other ideas for what to try / look into, that'll be much appreciated. Dmitry