On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote:
> I know that you can use a facet query to get the unique terms for a
> field taking account of any q or fq parameters but for our use case the
> counts are not needed. So is there a more efficient way of finding 
> just unique terms for a field?

Short answer: Not at this moment.


If the amount of unique terms is large (millions), a fair amount of
temporary memory could be spared by just keeping track of matched terms
with a boolean vs. the full int for standard faceting. Reduced memory
requirements means less garbage collection and faster processing due to
better cache utilization. So yes, there is a more efficient way.

Guessing from your other posts, you are building a social network and
need to query on surnames and similar large fields. Question is of
course how large the payoff will be and if it is worth the investment in
development hours. I would suggest hacking the current faceting code to
use OpenBitSet instead of int[] and doing performance tests on that.
PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
seems to be the right places to look in Solr 4.

Regards,
Toke Eskildsen, State and University Library, Denmark

Reply via email to