On 2-Nov-07, at 10:03 AM, Haishan Chen wrote:





Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query Performance Question> From: [EMAIL PROTECTED]> To: solr- [EMAIL PROTECTED]> > He means "extremely frequent" and I agree. --wunder


Then it means a PHRASE (combination of terms except stopwords) appear in 5% to 10% of an index should NOT be that frequent? I guess I get the idea.

Phrases should be rarer than individual keywords. 5-10% is moderately high even for a _single_ keyword, let alone the conjunction of two keywords, let alone the _exact phrase_ of two keywords (non stopwords in all of this discussion).

As I mentioned, the 'natural' rate of 'auto'+'repair' on a corpus 100's of times bigger than yours (web documents) is .1%, and the rate of the phrase 'auto repair' is .025%.

It still feels to me that you are trying doing something unique with your phrase queries. Unfortunately, you still haven't said what you are trying to do in general terms, which makes it very difficult for people to help you.

-Mike

Reply via email to