Re: common words not stop words?? how to ??

rubdabadub Mon, 19 Feb 2007 11:28:44 -0800

Walter:

Thanks for the feedback.


On 2/19/07, Walter Underwood <[EMAIL PROTECTED]> wrote:

Lucene/Solr does this automatically. That is how a tf.idf
engine works, it boosts rare words.

Do you have examples of problems or are you worrying about
something that might happen?


Actually my use case is the following: Lets say hypothetically you
have a field with 100 "sentence long title". If you read those title
you can pretty much group them into 5 subject matter. A hypothetical
example  is.. (Total number of title is 125, 25 of them can not be
grouped)

22 title is about = How good is Person X
14 title is about = How bad is Product Y
10 title is about = London weather
36 title is about = How cool is the movie Z
18 title is about = The next big MS virus.

What I am trying to achive is

I would like to weed out "London weather" as a group cos it is not
interesting in my use case .. Lets say it is noise not signal. So I
thought I could use some "common words" ..  Furthermore I was thinking
having common words .. I could boost certain field i.e. if the Person
X is a known person example a "Prime minister" or " a "movie star"
having certain word attached to another known word meaning its
important.  Maybe I defined my problem wrongly.. I hope above gives
you an overview..

Regards

Re: common words not stop words?? how to ??

Reply via email to