On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:
I'm currently looking at methods of term extraction and automatic
keyword
generation from indexed documents.
We do it manually (not in solr, but we put the results in solr.) We
do it the usual way - chunk (into n-grams, named entities & noun
phrases) and count (tf & df). It works well enough. There is a bevy
of literature on the topic if you want to get "smart" -- but be
warned smart and fast are likely not very good friends.
A lot depends on the provenance of your data -- is it clean text that
uses a lot of domain specific terms? Is it webtext?