Thanks Erick/Jagdish.
Just to give some background on my queries.
1. All my queries are unique. A query can be: "ipod" and "ipod 8gb" (but
these are unique). These are about 1.2M in total.
So, I assume setting a high queryResultCache, queryResultWindowSize and
queryResultMaxDocsCached won't help.
Solrconfig.xml has got entries which you can tweak for your use case. One
of them is queryresultwindowsize. You can try using the value of 2000 and
see if it helps improving performance. Please make sure you have enough
memory allocated for queryresultcache.
A combination of sharding and distributi
50M documents, depending on a bunch of things,
may not be unreasonable for a single node, only
testing will tell.
But the question I have is whether you should be
using standard Solr queries for this or building a custom
component that goes at the base Lucene index
and "does the right thing". Or e
Thanks Erick/Peter.
This is an offline process, used by a relevancy engine implemented around
solr. The engine computes boost scores for related keywords based on
clickstream data.
i.e.: say clickstream has: ipad=upc1,upc2,upc3
I query solr with keyword: "ipad" (to get 2000 documents) and then mak
Hello Utkarsh,
This may or may not be relevant for your use-case, but the way we deal with
this scenario is to retrieve the top N documents 5,10,20or100 at a time
(user selectable). We can then page the results, changing the start
parameter to return the next set. This allows us to 'retrieve' milli
Well, depending on how many docs get served
from the cache the time will vary. But this is
just ugly, if you can avoid this use-case it would
be a Good Thing.
Problem here is that each and every shard must
assemble the list of 2,000 documents (just ID and
sort criteria, usually score).
Then the n
Also, I don't see a consistent response time from solr, I ran ab again and
I get this:
ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
"
Benchmarking x.amazonaws.com (be patient)
Completed 100 re
Hello,
I have a usecase where I need to retrive top 2000 documents matching a
query.
What are the parameters (in query, solrconfig, schema) I shoud look at to
improve this?
I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM,
8vCPU and 7GB JVM heap size.
I have documentCac