Re: Contrib module for Document Clustering

davidphilip cherian Wed, 06 Apr 2016 21:11:19 -0700

Hi Joel,

Right now, we are (web) crawling almost 85millions of documents and this
can increase to double. Collection is plainly divided into shards and so
while searching, its search across all shards.
If it is possible for a system to distributed documents into shards based
on documents similarity, and at search time, analyze the query and search
across these shards, it can improve search time performance and reduce
resource utilization as well.  Let me know your thoughts. Use Case: Since
this is a web search kind of data, both false positives and false negatives
to an extent should be fine.




On Wed, Apr 6, 2016 at 11:18 PM, Joel Bernstein <joels...@gmail.com> wrote:

> I don't know of any contrib or module that does this. Can you describe why
> you'd want to route documents to shards based on similarity? What
> advantages would you get by using this approach?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Apr 6, 2016 at 1:36 PM, davidphilip cherian <
> davidphilipcher...@gmail.com> wrote:
>
> > Any thoughts?
> >
> >
> > On Tue, Apr 5, 2016 at 9:05 PM, davidphilip cherian <
> > davidphilipcher...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Is there any contribution(open source contrib module) that routes
> > > documents to shards based on document similarity technique? Or any
> > > suggestions that integrates mahout to solr for this use case?
> > >
> > > From what I know, currently there are two document route strategies as
> > > explained here
> > > https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/.
> But
> > > Is there anything else that I'm missing?
> > >
> > >
> > >
> > >
> > > Thanks.
> > >
> > >
> > >
> >
>

Re: Contrib module for Document Clustering

Reply via email to