: Operationally, I was thinking a tokenizer could use the stop-word list
: (or an optional-word list) to mark tokens as optional rather than
: removing them from the token stream.  DisMaxOptional would then
: generate appropriate queries with the non-optionals as the core and
: then permute the optionals around those as optional clauses.  I say
: this with no deep understanding of how DisMax does its thing, of
: course, so feel free to call me naive.

you're not naive ... the problem is just that *all* of the clauses are 
allready optional (unless the term had a "+" or "-" in front of it), 
that's where the mm param comes in, it decides how many of those optional 
params should be mandatory.

it sounds like what you want is for a new DisMaxOptional parser to look at 
this...

    on mice and men

and because it knows "on" and "and" are stop words, treat it the same as 
if the current DisMax parsed this...

    on +mice and +men

which is another interesting idea, but it changes the meaning of "mm" 
significantly, in that dismax with alow mm would not longer be tolerant of 
mispelled (or missing) words unless they were stop words.

my gut tells me changing dismax so that having multiple qf params result 
in multiple dismax queries would address your problem more directly.

: I think I've so internalized list advice *not* to generate multiple
: queries that that didn't readily occur to me.  :-)   One problem I
: suppose is that query might return some results but not the desired
: one (perhaps there is a title On Men and Mice) and so I don't get to
: the second query ("mice men" once stopped) that would get me Of Mice
: and Men.  But an improvement in cases where no results come back from
: an overspecified query, I'd agree.

...which is why multiple dismax queries as clauses in the main query 
would be good ... the results from each would be blended together.

: The other thought I've had is to just do some query analysis up front
: prior to submission -- if the query is all stops, send it to a
        ...
: to boost up exact matches.  I hate the analysis step which would
: probably duplicate the tokenization done by solr, but might be worth
: it.  There'd still be some problematic queries, but this may be as
: close as it'll get.

you could probably skip the external analysis by swapping the order of 
your queries and looking at the debuging output when hitting the "second" 
query ... if your stopworded fields don't appear in the parsed query 
structure, then it's all stopwords, so you do need your "first" query.


-Hoss

Reply via email to