Re: Term extraction

Brian Whitman Wed, 19 Sep 2007 19:15:03 -0700

On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

I'm currently looking at methods of term extraction and automatickeyword
generation from indexed documents.

We do it manually (not in solr, but we put the results in solr.) Wedo it the usual way - chunk (into n-grams, named entities & nounphrases) and count (tf & df). It works well enough. There is a bevyof literature on the topic if you want to get "smart" -- but bewarned smart and fast are likely not very good friends.

A lot depends on the provenance of your data -- is it clean text thatuses a lot of domain specific terms? Is it webtext?

Re: Term extraction

Reply via email to