Best guess is that your indexes are too big for your memory.
I think your focus on number of rows is misleading you, you’ll
see why in a moment.
Lucene indexes are essentially accessed randomly, there’s
very little locality. Here’s an excellent article explaining
how Lucene uses memory:
https://b
Thanks again, Erick, for pointing us in the right direction.
Yes, I am seeing heavy disk I/O while querying. I queried a single
collection. A query for 10 rows can cause 100-150 MB disk read on each
node. While querying for a 1000 rows, disk read is in range of 2-7 GB per
node.
Is this normal? I
Right, you’re running into the “laggard” problem, you can’t get the overall
result back until every shard has responded. There’s an interesting
parameter “shards.info=true” will give you some information about
the time taken by the sub-search on each shard.
But given your numbers, I think your roo
Thanks for your reply, Erick. You helped me in improving my understanding
of how Solr distributed requests work internally.
Actually my ultimate goal is to improve search performance in one of our
test environment where the queries are taking upto 60 seconds to execute.
*We want to fetch at least
First of all, asking for that many rows will spend a lot of time
gathering the document fields. Assuming you have stored fields,
each doc requires
1> the aggregator node getting the candidate 10 docs from each shard
2> The aggregator node sorting those 10 docs from each shard into the true
I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
shards on two different nodes. There are 4M records equally distributed
across the shards.
If I query the collection like below, it is slow.
http://localhost:8983/solr/*test*/select?q=*:*&rows=10
QTime: 6930
If I query a