> sure, but what logic would you suggest be used to decide when to make them
> optional?  :)

Operationally, I was thinking a tokenizer could use the stop-word list
(or an optional-word list) to mark tokens as optional rather than
removing them from the token stream.  DisMaxOptional would then
generate appropriate queries with the non-optionals as the core and
then permute the optionals around those as optional clauses.  I say
this with no deep understanding of how DisMax does its thing, of
course, so feel free to call me naive.

As to what words to put in the optionals list, the function words
(articles and prepositions) seem to be the ones that folks either omit
or confuse, so they'd be good candidates.

> start by hitting Solr using a qf with fields that contain stop words.  if
> you get 0 hits, then query with a qf that contains all fields that don't
> have stop words in them, (but you can leave them in pf).

I think I've so internalized list advice *not* to generate multiple
queries that that didn't readily occur to me.  :-)   One problem I
suppose is that query might return some results but not the desired
one (perhaps there is a title On Men and Mice) and so I don't get to
the second query ("mice men" once stopped) that would get me Of Mice
and Men.  But an improvement in cases where no results come back from
an overspecified query, I'd agree.

The other thought I've had is to just do some query analysis up front
prior to submission -- if the query is all stops, send it to a
separate handler that doesn't do stop-word removal in the qf
specification, otherwise if any non-stop-word exists, send it to a
handler with a qf that does remove stops and rely on the pf component
to boost up exact matches.  I hate the analysis step which would
probably duplicate the tokenization done by solr, but might be worth
it.  There'd still be some problematic queries, but this may be as
close as it'll get.

Thanks for the suggestions, Hoss!

Ron

Reply via email to