Sorry for late response. To be strict, here is what i want:

* I get documents all the time. Let's assume those are news (It's
rather similar thing).

* Every time i get new batch of "news" i should add them to Solr index
and get cluster information for that document. Store this information
in the DB (so i should know each document's cluster).

* I can't wait for cluster definition service/program to launch from
time to time, but it should define clusters on the fly.

* I want to be able to get clusters only for some period of time (For
example i want to search for clusters only for documents that were
loader one month ago).

* I will have tens of thousands of new documents every day and overall
base of several millions.

I'm reading "Mahout in action" now. But maybe you can point me to what i
need.
--- Исходное сообщение ---
От кого: "Chandan Tamrakar" <chandan.tamra...@nepasoft.com>
Кому: solr-user@lucene.apache.org
Дата: 4 сентября 2012, 12:30:56
Тема: Re: Solr Clustering



>

yes there is a solr component if you want to cluster solr documents , check
the following linkhttp://wiki.apache.org/solr/ClusteringComponent
Carrot2 might be good if you want to cluster few thousands of documents ,
for example when user search solr , just cluster the  search results

Mahout is much more scalable and probably you need Hadoop for that


thanks
chandan

On Tue, Sep 4, 2012 at 2:10 PM, Denis Kuzmenok <forward...@ukr.net> wrote:

>
>
> -------- Original Message --------
> Subject: Solr Clustering
> From: Denis Kuzmenok <forward...@ukr.net>
> To: solr-user@lucene.apache.org> CC:
>
> Hi, all.
> I know there is carrot2 and mahout for clustering. I want to implement
> such thing:
> I fetch documents and want to group them into clusters when they are added
> to index (i want to filter "similar" documents for example for 1 week). i
> need these documents quickly, so i cant rely on some postponed
> calculations. Each document should have assigned cluster id (like group
> similar documents into clusters and assign each document its cluster id.
> It's something similar to news aggregators like google news. I dont need
> to search for clusters with documents older than 1 week (for example). Each
> document will have its unique id and saved into DB. But solr will have
> cluster id field also.
> Is it possible to implement this with solr/carrot/mahout?




-- 
Chandan Tamrakar
*
*

Reply via email to