I have a much smaller document set of 300 K documents that I am talking about.
I have some pre-defined queries (totally around 10K ) that I want to implement as facets on the resultset of a given query and return the top N (say 100) of the same. I was planning to pre-compute the DocSet results for each of these queries - store them in memory . And then - for every successive query - I can do an intersection of the query resultset with the predefined docset and extract the top N (after sorting, of course). Before I go along these lines - I was considering the memory usage of 10K docsets. http://lucene.apache.org/solr/api/org/apache/solr/search/DocSet.html , lists 3 possible implementations ( conventional Bitset as BitDocSet, HashDocSet for sparse docsets and DocSlice , ??? ) . Most of these 10K docsets that I am talking about would fall into sparse docset category. I am curious what docset implementation would be chosen to store the docset result. (Does it automatically select the right one based on the density of the docset , for eg - if the number of set bits in bitset is > 1/8th then may be storing as a BitDocSet might be ok - but for storing a bitset with 10 bits out of possible 300K . HashDocSet might be better) . Where can I look ( in solr source code) to understand more about this. Thanks.