Walter: Thanks for the feedback.
On 2/19/07, Walter Underwood <[EMAIL PROTECTED]> wrote:
Lucene/Solr does this automatically. That is how a tf.idf engine works, it boosts rare words. Do you have examples of problems or are you worrying about something that might happen?
Actually my use case is the following: Lets say hypothetically you have a field with 100 "sentence long title". If you read those title you can pretty much group them into 5 subject matter. A hypothetical example is.. (Total number of title is 125, 25 of them can not be grouped) 22 title is about = How good is Person X 14 title is about = How bad is Product Y 10 title is about = London weather 36 title is about = How cool is the movie Z 18 title is about = The next big MS virus. What I am trying to achive is I would like to weed out "London weather" as a group cos it is not interesting in my use case .. Lets say it is noise not signal. So I thought I could use some "common words" .. Furthermore I was thinking having common words .. I could boost certain field i.e. if the Person X is a known person example a "Prime minister" or " a "movie star" having certain word attached to another known word meaning its important. Maybe I defined my problem wrongly.. I hope above gives you an overview.. Regards