: Operationally, I was thinking a tokenizer could use the stop-word list : (or an optional-word list) to mark tokens as optional rather than : removing them from the token stream. DisMaxOptional would then : generate appropriate queries with the non-optionals as the core and : then permute the optionals around those as optional clauses. I say : this with no deep understanding of how DisMax does its thing, of : course, so feel free to call me naive.
you're not naive ... the problem is just that *all* of the clauses are allready optional (unless the term had a "+" or "-" in front of it), that's where the mm param comes in, it decides how many of those optional params should be mandatory. it sounds like what you want is for a new DisMaxOptional parser to look at this... on mice and men and because it knows "on" and "and" are stop words, treat it the same as if the current DisMax parsed this... on +mice and +men which is another interesting idea, but it changes the meaning of "mm" significantly, in that dismax with alow mm would not longer be tolerant of mispelled (or missing) words unless they were stop words. my gut tells me changing dismax so that having multiple qf params result in multiple dismax queries would address your problem more directly. : I think I've so internalized list advice *not* to generate multiple : queries that that didn't readily occur to me. :-) One problem I : suppose is that query might return some results but not the desired : one (perhaps there is a title On Men and Mice) and so I don't get to : the second query ("mice men" once stopped) that would get me Of Mice : and Men. But an improvement in cases where no results come back from : an overspecified query, I'd agree. ...which is why multiple dismax queries as clauses in the main query would be good ... the results from each would be blended together. : The other thought I've had is to just do some query analysis up front : prior to submission -- if the query is all stops, send it to a ... : to boost up exact matches. I hate the analysis step which would : probably duplicate the tokenization done by solr, but might be worth : it. There'd still be some problematic queries, but this may be as : close as it'll get. you could probably skip the external analysis by swapping the order of your queries and looking at the debuging output when hitting the "second" query ... if your stopworded fields don't appear in the parsed query structure, then it's all stopwords, so you do need your "first" query. -Hoss