Hi Shawn et al, Thanks a lot for the prompt answer.
It looks to me that I made quite a few mistakes in formulating those solr queries. Setting shards.qt to the name of the core was completely wrong. I tried to search for shards.qt in http://lucene.apache.org/solr/guide/7_3/ but it did not give any answers. Googling for shards.qt was more successful (I found an explanation what it means in two books, and a pointers and numerous examples in usages in the top 40 results). Which means that I would suggest adding a sentence saying 'use shards.qt as q` somewhere in the documentation would not hurt :-) Recomputing the queries: http://localhost:9999/solr/de_wiki_all_shard1_replica_n1/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false returns { "responseHeader":{ "zkConnected":true, "status":0, "QTime":3400}, "terms":{ "text":[ "8",671396, "application",671396, "articles",671396, "charset",671396, "de",671396, "f",671396, "utf",671396, "wiki",671396, "xhtml",671396, "xml",671396]}} http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json returns { "responseHeader":{ "zkConnected":true, "status":0, "QTime":2584}, "terms":{ "text":[ "8",670564, "application",670564, "articles",670564, "charset",670564, "de",670564, "f",670564, "utf",670564, "wiki",670564, "xhtml",670564, "xml",670564]}} LOG in CORE1: INFO - 2018-06-22 15:27:40.779; [c:de_wiki_all s:shard1 r:core_node3 x:de_wiki_all_shard1_replica_n1] org.apache.solr.core.SolrCore; [de_wiki_all_shard1_replica_n1] webapp=/solr path=/terms params={distrib=false&terms.fl=text&terms.limit=10&wt=json} status=0 QTime=3027 INFO - 2018-06-22 15:27:42.059; [c:de_wiki_all s:shard3 r:core_node11 x:de_wiki_all_shard3_replica_n8] org.apache.solr.core.SolrCore; [de_wiki_all_shard3_replica_n8] webapp=/solr path=/terms params={distrib=false&terms.fl=text&terms.limit=10&wt=json} status=0 QTime=2608 The number did not change also after http://localhost:9999/solr/de_wiki_all/update?commit=true (you correctly assumed that the collection is not getting any updates). After I fired this query: http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json&distrib=true { "responseHeader":{ "zkConnected":true, "status":0, "QTime":70245}, "terms":{ "text":{ "8":2681402, "application":2681402, "articles":2681402, "charset":2681402, "de":2681402, "f":2681402, "utf":2681402, "wiki":2681402, "xhtml":2681402, "xml":2681402}}} with the log line: INFO - 2018-06-22 15:32:54.805; [c:de_wiki_all s:shard1 r:core_node3 x:de_wiki_all_shard1_replica_n1] org.apache.solr.core.SolrCore; [de_wiki_all_shard1_replica_n1] webapp=/solr path=/terms params={distrib=true&terms.fl=text&terms.limit=10&wt=json} status=0 QTime=70245 even the 1st query started returning the same results (shouldn't the query be faster in the distributed settings?): http://localhost:9999/solr/de_wiki_all_shard1_replica_n1/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false { "responseHeader":{ "zkConnected":true, "status":0, "QTime":3438}, "terms":{ "text":[ "8",671396, "application",671396, "articles",671396, "charset",671396, "de",671396, "f",671396, "utf",671396, "wiki",671396, "xhtml",671396, "xml",671396]}} http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json { "responseHeader":{ "zkConnected":true, "status":0, "QTime":3325}, "terms":{ "text":[ "8",671396, "application",671396, "articles",671396, "charset",671396, "de",671396, "f",671396, "utf",671396, "wiki",671396, "xhtml",671396, "xml",671396]}} Also http://localhost:9997/solr/de_wiki_all_shard2_replica_n4/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false { "responseHeader":{ "zkConnected":true, "status":0, "QTime":2637}, "terms":{ "text":[ "8",670221, "application",670221, "articles",670221, "charset",670221, "de",670221, "f",670221, "utf",670221, "wiki",670221, "xhtml",670221, "xml",670221]}} http://localhost:9997/solr/de_wiki_all_shard4_replica_n12/terms?terms.limit=10&terms.fl=text&wt=json&distrib=false { "responseHeader":{ "zkConnected":true, "status":0, "QTime":2536}, "terms":{ "text":[ "8",669221, "application",669221, "articles",669221, "charset",669221, "de",669221, "f",669221, "utf",669221, "wiki",669221, "xhtml",669221, "xml",669221]}} http://localhost:9997/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json { "responseHeader":{ "zkConnected":true, "status":0, "QTime":2405}, "terms":{ "text":[ "8",669221, "application",669221, "articles",669221, "charset",669221, "de",669221, "f",669221, "utf",669221, "wiki",669221, "xhtml",669221, "xml",669221]}} which means that de_wiki_all/terms query is being redirected to only one of the cores and computed locally. On the performance part: the PC has 32GB of RAM with some 10GB left for the OS to cache things. The complete index is ~40GB (the complete collection as text documents was ~40GB), each replica is around 3.5GB large (shown e.g., in http://172.16.203.123:9999/solr/#/de_wiki_all_shard1_replica_n1). What would be the easiest way to get all index/replicas listed with their corresponding size in bytes? What is the complexity of this terms query? Does solr need to go through individual inverted indexes, or does solr needs to scan the list of terms only (does every list have the number of IDs in the inverted index precomputed?)? This part of the question is particularly interesting as I was not able to compute the de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json&distrib=true with 2GB per core of memory (due to "unable to allocate memory in java heap" I had to increase every instance it to 3GB). Cheers, Arturas On Fri, Jun 22, 2018 at 4:28 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 6/22/2018 8:12 AM, Shawn Heisey wrote: > >> I wonder if having an invalid handler contributed to the speed. >> > > Further thought about this: > > I can't say whether having an invalid handler name would cause speed > problems, but based on my limited understanding of the code involved, I > don't think it would. > > I'm guessing that with a shards.qt value that doesn't start with a slash, > that the request gets sent to /select, with a qt parameter set to the > value. Solr would most likely ignore any qt value, because the > handleSelect setting on requestDispatcher in solrconfig.xml has defaulted > to false for many versions. > > Another possibility is that the OS had cached the information in a > different replica for the full distributed query, and this made that query > fast, but when the query directed to a specific shard replica was made, > that data wasn't cached, and so Solr had to read the disk to satisfy the > query, which is going to REALLY slow it down. I would imagine that if you > repeated the single-shard query multiple times, especially using the > different URL that I gave you, the speed discrepancy might disappear. > > Thanks, > Shawn > >