Any news regarding this ? I'm investigating in Solr offline clustering as well ( full index clustering).
Cheers 2012-09-17 20:16 GMT+01:00 Denis Kuzmenok <forward...@ukr.net>: > > > > Sorry for late response. To be strict, here is what i want: > > * I get documents all the time. Let's assume those are news (It's > rather similar thing). > > * Every time i get new batch of "news" i should add them to Solr index > and get cluster information for that document. Store this information > in the DB (so i should know each document's cluster). > > * I can't wait for cluster definition service/program to launch from > time to time, but it should define clusters on the fly. > > * I want to be able to get clusters only for some period of time (For > example i want to search for clusters only for documents that were > loader one month ago). > > * I will have tens of thousands of new documents every day and overall > base of several millions. > > I'm reading "Mahout in action" now. But maybe you can point me to what i > need. > --- Исходное сообщение --- > От кого: "Chandan Tamrakar" <chandan.tamra...@nepasoft.com> > Кому: solr-user@lucene.apache.org > Дата: 4 сентября 2012, 12:30:56 > Тема: Re: Solr Clustering > > > > > > > yes there is a solr component if you want to cluster solr documents , check > the following linkhttp://wiki.apache.org/solr/ClusteringComponent > Carrot2 might be good if you want to cluster few thousands of documents , > for example when user search solr , just cluster the search results > > Mahout is much more scalable and probably you need Hadoop for that > > > thanks > chandan > > On Tue, Sep 4, 2012 at 2:10 PM, Denis Kuzmenok <forward...@ukr.net> wrote: > > > > > > > -------- Original Message -------- > > Subject: Solr Clustering > > From: Denis Kuzmenok <forward...@ukr.net> > > To: solr-user@lucene.apache.org> CC: > > > > Hi, all. > > I know there is carrot2 and mahout for clustering. I want to implement > > such thing: > > I fetch documents and want to group them into clusters when they are > added > > to index (i want to filter "similar" documents for example for 1 week). i > > need these documents quickly, so i cant rely on some postponed > > calculations. Each document should have assigned cluster id (like group > > similar documents into clusters and assign each document its cluster id. > > It's something similar to news aggregators like google news. I dont need > > to search for clusters with documents older than 1 week (for example). > Each > > document will have its unique id and saved into DB. But solr will have > > cluster id field also. > > Is it possible to implement this with solr/carrot/mahout? > > > > > -- > Chandan Tamrakar > * > * > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England