Thank you!  That makes sense.
 
--Casey

>>> Mike Klaas <[EMAIL PROTECTED]> 6/7/2007 2:35 PM >>>
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:

> It appears that if your search terms include stopwords and you use  
> the DisMax request handler, you get no results whereas the same  
> search with the standard request handler does give you results.  Is  
> this a bug or by design?

There is a subtlety with stopwords and dismax.  Imagine a search  
"what's in python", using a typical analyzer with stopwords for  
fields such as title, inlinks, rawText, but a more restrictive  
analyzer for fields such as url, that have no stopwords.
For the above search using the following weight function

title^1.2 inlinks^1.4 rawText^1.0
produces the following parsed query string

+(
   (
    (rawText:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (rawText:python | inlinks:python^1.4 | title:python^1.2)~0.01
   )~2
  )
  (rawText:"what python"~5 | inlinks:"what python"~5^1.4 |  
title:"what python"~5^1.2)~0.01
while the same query with a weight function of

title^1.2 inlinks^1.4 rawText^1.0 url^1.0
produces this query string

+(
   (
    (rawText:what | url:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (url:in)~0.01
    (rawText:python | url:python | inlinks:python^1.4 |  
title:python^1.2)~0.01
   )~3
  )
  (rawText:"what python"~5 | url:"what in python"~5 | inlinks:"what  
python"~5^1.4 | title:"what python"~5^1.2)~0.01
Note the latter includes a term (url:in)~0.01 on its own. This  
interacts poorly when using a high mm (minimum #clauses match)  
setting with dismax, as it effectively requires 'in' to be in the url  
column, which was probably not the intent of the query.

-Mike

Reply via email to