I am a colleague of the person who posted the original question. We have done some more analysis and have more information to provide.
Here are the responses to Toke's questions: >> * Do they (slow performing queries) occur under heavy network load? No, they don't. This happens even when there is only a single user on the system. It doesn't appear to be a capacity issue. >> * Do they occur after specific queries? No, even the simplest of queries run slow - and when things are slow - the Qtimes always hover around the something greater than 5000 ms. >> * Do they occur at specific times (e.g. each whole hour)? They don't occur at specific times - However there is indeed a timing aspect behind this issue - which I shall explain below. Here is what I did - I fired a single query multiple times again and again on all nodes in my cluster - and observed the following: 1. Slowness happens only if the Client App sends the request to a node (lets call this NodeX) that does NOT host the shard containing the data we are looking for (we use a document co-location strategy to index related documents into a single shard). 2. Slowness never ever happens when Client App sends the request to a node (lets call this NodeY) that hosts the correct shard (i.e the data we are looking for). 3. Slowness does NOT happen if the Client App sends the request to NodeX - and the previous query to NodeX was executed within the last 1.5 minutes. 4. Slowness happens if the Client App sends the request to NodeX - and the previous query to NodeX was executed prior to the last 1.5 minutes. These observations leads me to believe the following (still a theory): 1. There is something thats breaking / disrupting inter-node communication between NodeX and NodeY Could this be a firewall or something similar ? 2. Whenever NodeX remains idle for more than 1.5 minutes - its connection to NodeY is dropped (I can't see anything in the logs to that effect though), and when the next request comes in - it takes 5 seconds to recreate the connection This 1.5 minute window and the 5 second delay are pretty consistent I checked with my network folks - and they say that all network interfaces are UP - and there are NO packet losses between the servers in question. What else should I be asking my friends in the networking group to look at ? They did ask me what protocol Solr uses for inter-node communication - and I answered HTTP. Thanks and appreciate your inputs. -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4144493.html Sent from the Solr - User mailing list archive at Nabble.com.