I am a colleague of the person who posted the original question. We have done
some more analysis and have more information to provide.

Here are the responses to Toke's questions:

>> * Do they (slow performing queries) occur under heavy network load? 

No, they don't. This happens even when there is only a single user on the
system. It doesn't appear to be a capacity issue.

>> * Do they occur after specific queries? 
No, even the simplest of queries run slow - and when things are slow - the
Qtimes always hover around the something greater than 5000 ms.

>> * Do they occur at specific times (e.g. each whole hour)? 
They don't occur at specific times - However there is indeed a timing aspect
behind this issue - which I shall explain below.

Here is what I did - I fired a single query multiple times again and again
on all nodes in my cluster - and observed the following:

1. Slowness happens only if the Client App sends the request to a node (lets
call this NodeX) that does NOT host the shard containing the data we are
looking for (we use a document co-location strategy to index related
documents into a single shard).

2. Slowness never ever happens when Client App sends the request to a node
(lets call this NodeY) that hosts the correct shard (i.e the data we are
looking for).

3. Slowness does NOT happen if the Client App sends the request to NodeX -
and the previous query to NodeX was executed within the last 1.5 minutes.

4. Slowness happens if the Client App sends the request to NodeX - and the
previous query to NodeX was executed prior to the last 1.5 minutes.

These observations leads me to believe the following (still a theory):
1. There is something thats breaking / disrupting inter-node communication
between NodeX and NodeY
        Could this be a firewall or something similar ?

2. Whenever NodeX remains idle for more than 1.5 minutes - its connection to
NodeY is dropped (I can't see anything in the logs to that effect though),
and when the next request comes in - it takes 5 seconds to recreate the
connection
        This 1.5 minute window and the 5 second delay are pretty consistent

I checked with my network folks - and they say that all network interfaces
are UP - and there are NO packet losses between the servers in question.

What else should I be asking my friends in the networking group to look at ?
They did ask me what protocol Solr uses for inter-node communication - and I
answered HTTP.

Thanks and appreciate your inputs.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4144493.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to