On Tue, 2015-08-25 at 10:40 +0800, Zheng Lin Edwin Yeo wrote:
> Would like to confirm, when I set rows=100, does it mean that it only build
> the cluster based on the first 100 records that are returned by the search,
> and if I have 1000 records that matches the search, all the remaining 900
> records will not be considered for clustering?

That is correct. It is not stated very clearly, but it follows from
trading the comments in the third example at
https://cwiki.apache.org/confluence/display/solr/Result
+Clustering#ResultClustering-Configuration

> As if that is the case, the result of the cluster may not be so accurate as
> there is a possibility that the first 100 records might have a large amount
> of similarities in the records, while the subsequent 900 records have
> differences that could have impact on the cluster result.

Such is the nature of on-the-fly clustering. The clustering aims to be
as representative of your search result as possible. Assigning more
weight to the higher scoring documents (in this case: All the weight, as
those beyond the top-100 are not even considered) does this.

If that does not fit your expectations, maybe you need something else?
Plain faceting perhaps? Or maybe enrichment of the documents with some
sort of entity extraction?

- Toke Eskildsen, State and University Library, Denmark


Reply via email to