Hi, On Tue, Aug 11, 2009 at 22:19, Mark Bennett <mbenn...@ideaeng.com> wrote:
Carrot2 has several pluggable algorithms to choose from, though I have no > evidence that they're "better" than Lucene's. Where TF/IDF is sort of a > one > step algebraic calculation, some clustering algorithms use iterative > approaches, etc. I'm not sure if I completely follow the way in which you'd like to use Carrot2 for scoring -- would you cluster the whole index? Carrot2 was designed to be a post-retrieval clustering algorithm and optimized to cluster small sets of documents (up to ~1000) in real time. All processing is performed in-memory, which limits Carrot2's applicability to really large sets of documents. S.