Here is the query URL that I did.  The info included in this message is
slightly redacted.

http://bigindy5.REDACTED.com:8982/solr/sparkmain/search?q=%28german+shepherd%29&qt=/search&start=0&fq=NOT%28feature:redact1+OR+feature:spkhistorical%29&fq=%28ip:%28AP%29+AND+price:0%29+OR+%28ip:%28BB%29%29+OR+%28ip:%28COR%29+AND+price:0%29+OR+%28ip:%28GET%29+AND+%28collection:subscription+OR+collection:editorialsubscription%29%29+OR+%28ip:%28PA%29%29+OR+%28ip:%28RTR%29%29+OR+%28ip:%28RX%29+AND+price:0%29+OR+%28ip:%28USAT%29+AND+price:0%29+OR+%28ip:%28AFP%29%29+OR+%28ip:%28GET%29+AND+NOT+collection:subscription+AND+NOT+collection:editorialsubscription%29+OR+%28ip:%28RX%29%29&fq=restr:%28%28worldwide+OR+none+OR+aus_i%29+AND+NOT+%28aus_x%29%29&fq=doc_date:[1900-01-01T00:00:00Z+TO+2015-12-12T00:00:00Z]&sort=post_date+desc&rows=75&debugQuery=true&indent=true&wt=json

Sent to a set of production servers running 4.9.1 (when the caches are
cold), this takes about 7 seconds.  Sent to a 5.3.2-SNAPSHOT dev server
with cold caches, it takes about 15 seconds -- because that server is
particularly low on memory.  Once the query is cached, it takes 100
milliseconds or less, even on the dev server.

Checking one of the shard indexes with the schema browser, ip has 34
unique terms, feature has 108 unique terms, collection has 824 unique
terms, restr has 128 unique terms.  As expected, doc_date has about 12
million unique terms for the shard.  It is a TrieDateField with a
precisionStep of 16.  The rest of the shards have similar unique term
counts.  The entire sharded index has 244 million documents in it - six
shards with 40.6 million each and one shard with under 500K documents.

I've been trying to figure out why this query is so slow.  I can't see
anything obvious, but I did encounter something really weird in the
debug output.

This is the params section of the response -- as you can see, echoParams
is set to all, and you can see the shards parameter defined in
solrconfig.xml:

http://apaste.info/pGe

This is the filter query info from the debug -- showing the same set of
filters seven times, which I assume is because there are seven shards. 
I do not know if this is a debug glitch.  The response info here is from
the dev server, but the production servers give the same info:

http://apaste.info/HpC

Does anyone have thoughts about the repeated filter information in the
debug output, or why it takes several seconds for this query to run?

General performance on the production index is pretty good.  Over the
last 9800 queries, the production server has a median qtime of 238ms and
a 95th percentile of 3672ms.  The query rate is less than one per second.

Thanks,
Shawn

Reply via email to