Hello,
I am trying to understand how I can size the caches for my solr powered
application. Some details on the index and application :
Solr Version : 1.3
JDK : 1.5.0_14 32 bit
OS : Solaris 10
App Server : Weblogic 10 MP1
Number of documents : 1 million
Total number of fields : 1000 (750 strings, 225 int/float/double/long, 25
boolean)
Number of fields on which faceting and filtering can be done : 400
Physical size of  index : 600MB
Number of unique values for a field : Ranges from 5 - 1000. Average of 150
-Xms and -Xmx vals for jvm : 3G
Expected number of concurrent users : 15
No sorting planned for now

Now I want to set appropriate values for the caches. I have put below some
of my understanding and questions about the caches. Please correct and
answer accordingly.
FilterCache:
As per the solr wiki, this is used to store an unordered list of Ids of
matching documents for an fq param.
So if a query contains two fq params, it will create two separate entries
for each of these fq params. The value of each entry is the list of ids of
all documents across the index that match the corresponding fq param. Each
entry is independent of any other entry.
A minimum size for filterCache could be (total number of fields * avg
number of unique values per field) ? Is this correct ? I have not enabled
<useFilterForSortedQuery>.
Max physical size of the filter cache would be (size * avg byte size of a
document id * avg number of docs returned per fq param) ?

QueryResultsCache:
Used to store an ordered list of ids of the documents that match the most
commonly used searches. So if my query is something like
q=Status:Active&fq=Org:Apache&fq=Version:13, it will create one entry that
contains list of ids of documents that match this full query. Is this
correct ? How can I size my queryResultsCache ? Some entries from
solrconfig.xml :
<queryResultWindowSize>50</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
Max physical size of the filterCache would be (size * avg byte size of a
document id * avg number of docs per query). Is this correct ?


documentCache:
Stores the documents that are stored in the index. So I do two searches
that return three documents each with 1 document being common between both
result sets. This will result in 5 entries in the documentCache for the 5
unique documents that have been returned for the two queries ? Is this
correct ? For sizing, SolrWiki states that "*The size for the documentCache
should always be greater than <max_results> * <max_concurrent_queries>*".
Why do we need the max_concurrent_queries parameter here ? Is it when
max_results is much lesser than numDocs ? In my case, a q=*:*search is done
the first time the index is loaded. So, will setting documentCache size to
numDocs be correct ? Can this be like the max that I need to allocate ?
Max physical size of document cache would be (size * avg byte size of a
document in the index). Is this correct ?

Thank you

-Rahul

Reply via email to