Re: Contrib module for Document Clustering

2016-04-07 Thread Joel Bernstein
My gut instinct is that it's a hard path you're considering. There is the logistics of sharding by document similarity on both the indexing side and query side. Even if you pull that off, it would be extremely difficult to know if you're getting good results and really hard to fix if you're not get

Re: Contrib module for Document Clustering

2016-04-06 Thread davidphilip cherian
Hi Joel, Right now, we are (web) crawling almost 85millions of documents and this can increase to double. Collection is plainly divided into shards and so while searching, its search across all shards. If it is possible for a system to distributed documents into shards based on documents similarit

Re: Contrib module for Document Clustering

2016-04-06 Thread Joel Bernstein
I don't know of any contrib or module that does this. Can you describe why you'd want to route documents to shards based on similarity? What advantages would you get by using this approach? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 6, 2016 at 1:36 PM, davidphilip cherian < davidphi

Re: Contrib module for Document Clustering

2016-04-06 Thread davidphilip cherian
Any thoughts? On Tue, Apr 5, 2016 at 9:05 PM, davidphilip cherian < davidphilipcher...@gmail.com> wrote: > Hi, > > Is there any contribution(open source contrib module) that routes > documents to shards based on document similarity technique? Or any > suggestions that integrates mahout to solr f

Contrib module for Document Clustering

2016-04-05 Thread davidphilip cherian
Hi, Is there any contribution(open source contrib module) that routes documents to shards based on document similarity technique? Or any suggestions that integrates mahout to solr for this use case? >From what I know, currently there are two document route strategies as explained here https://luc