Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > However, I find that clustering is exceeding slow after I index this 1GB of > data. It took almost 30 seconds to return the cluster results when I set it > to cluster the top 1000 records, and still take more than 3 seconds when I > set it to cluster the top 100 records.
Your clustering uses Carrot2, which fetches the top documents and performs real-time clustering on them - that process is (nearly) independent of index size. The relevant numbers here are top 1000 and top 100, not 1GB. The unknown part is whether it is the fetching of top 1000 (the Solr part) or the clustering itself (the Carrot part) that is the bottleneck. - Toke Eskildsen