On 2-Nov-07, at 10:03 AM, Haishan Chen wrote:
Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query
Performance Question> From: [EMAIL PROTECTED]> To: solr-
[EMAIL PROTECTED]> > He means "extremely frequent" and I
agree. --wunder
Then it means a PHRASE (combination of terms except stopwords)
appear in 5% to 10% of an index should NOT be that frequent? I
guess I get the idea.
Phrases should be rarer than individual keywords. 5-10% is
moderately high even for a _single_ keyword, let alone the
conjunction of two keywords, let alone the _exact phrase_ of two
keywords (non stopwords in all of this discussion).
As I mentioned, the 'natural' rate of 'auto'+'repair' on a corpus
100's of times bigger than yours (web documents) is .1%, and the rate
of the phrase 'auto repair' is .025%.
It still feels to me that you are trying doing something unique with
your phrase queries. Unfortunately, you still haven't said what you
are trying to do in general terms, which makes it very difficult for
people to help you.
-Mike