On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote: > I know that you can use a facet query to get the unique terms for a > field taking account of any q or fq parameters but for our use case the > counts are not needed. So is there a more efficient way of finding > just unique terms for a field?
Short answer: Not at this moment. If the amount of unique terms is large (millions), a fair amount of temporary memory could be spared by just keeping track of matched terms with a boolean vs. the full int for standard faceting. Reduced memory requirements means less garbage collection and faster processing due to better cache utilization. So yes, there is a more efficient way. Guessing from your other posts, you are building a social network and need to query on surnames and similar large fields. Question is of course how large the payoff will be and if it is worth the investment in development hours. I would suggest hacking the current faceting code to use OpenBitSet instead of int[] and doing performance tests on that. PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts seems to be the right places to look in Solr 4. Regards, Toke Eskildsen, State and University Library, Denmark