Re: Term extraction

Pieter Berkel Wed, 19 Sep 2007 19:39:54 -0700

Thanks Brian, I think the "smart" approaches you refer to might be outside
the scope of my current project.  The documents I am indexing already have
manually-generated keyword data, moving forward I'd like to have these
keywords automatically generated, selected from a pre-defined list of
keywords (i.e. the "simple" approach).


The data is fairly clean and domain-specific so I don't expect there will be
more than several hundred of these phrase terms to deal with, which is why I
was exploring the SynonymFilterFactory option.

Pieter



On 20/09/2007, Brian Whitman <[EMAIL PROTECTED]> wrote:
>
> On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:
>
> > I'm currently looking at methods of term extraction and automatic
> > keyword
> > generation from indexed documents.
>
> We do it manually (not in solr, but we put the results in solr.) We
> do it the usual way - chunk (into n-grams, named entities & noun
> phrases) and count (tf & df). It works well enough. There is a bevy
> of literature on the topic if you want to get "smart" -- but be
> warned smart and fast are likely not very good friends.
>
> A lot depends on the provenance of your data -- is it clean text that
> uses a lot of domain specific terms? Is it webtext?
>
>

Re: Term extraction

Reply via email to