Thanks Brian, I think the "smart" approaches you refer to might be outside the scope of my current project. The documents I am indexing already have manually-generated keyword data, moving forward I'd like to have these keywords automatically generated, selected from a pre-defined list of keywords (i.e. the "simple" approach).
The data is fairly clean and domain-specific so I don't expect there will be more than several hundred of these phrase terms to deal with, which is why I was exploring the SynonymFilterFactory option. Pieter On 20/09/2007, Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: > > > I'm currently looking at methods of term extraction and automatic > > keyword > > generation from indexed documents. > > We do it manually (not in solr, but we put the results in solr.) We > do it the usual way - chunk (into n-grams, named entities & noun > phrases) and count (tf & df). It works well enough. There is a bevy > of literature on the topic if you want to get "smart" -- but be > warned smart and fast are likely not very good friends. > > A lot depends on the provenance of your data -- is it clean text that > uses a lot of domain specific terms? Is it webtext? > >