Term extraction

Pieter Berkel Wed, 19 Sep 2007 18:58:49 -0700

I'm currently looking at methods of term extraction and automatic keyword
generation from indexed documents.  I've been experimenting with
MoreLikeThis and values returned by the "mlt.interestingTerms" parameter and
so far this approach has worked well.  However, I'd like to be able to
analyze documents more intelligently to recognize phrase keywords such as
"open source", "Microsoft Office", "Bill Gates" rather than splitting each
word into separate tokens (the field is never used in search queries so
matching is not an issue).  I've been looking at SynonymFilterFactory as a
possible solution to this problem but haven't been able to work out the
specifics of how to configure it for phrase mappings.


Has anybody else dealt with this problem before or able to offer any
insights into achieve the desired results?

Thanks in advance,
Pieter

Term extraction

Reply via email to