On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

I'm currently looking at methods of term extraction and automatic keyword
generation from indexed documents.

We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities & noun phrases) and count (tf & df). It works well enough. There is a bevy of literature on the topic if you want to get "smart" -- but be warned smart and fast are likely not very good friends.

A lot depends on the provenance of your data -- is it clean text that uses a lot of domain specific terms? Is it webtext?

Reply via email to