: We are using mm=70% in solrconfig.xml
: We are using qf=title description
: We are not doing phrase query in "q"
: 
: In case of a multi-word search text, mostly the end results are the junk
: ones. Because the words, mentioned in search text, are written in different
: fields and in different contexts.
: For example searching for "water proof" (without double quotes) brings a
: record where title = "rose water" and description = "... no proof of
: contamination ..."

Did you consider using "pf" ? ... just specifying something 
like "pf=title^100 description^100" should help shove records like the 
example you gave to the bottom of the result set relative to records that 
actualy contain the phrase "water proof" in a single field.

it won't *remove* these results, just promote other results, so it's not 
really comparible to what you are doing, but i still strongly suggest you 
consider it (in can even be complimentary to what you are doing now, by 
ensuring that the top N you pick from the first results are relaly the 
"top" N.

:    - We are firing first query to get top "n" results. We assume that first
:    "n" results are mostly good results. "n" is dynamic within a predefined
:    minimum and maximum value.
:    - We are calculating frequency of category ids in these top results. We
:    are not using facets because that gives count for all, relevant or
:    irrelevant, results.
:    - Based on category frequencies within top matching results we are
:    trying to find a few most frequent categories by simple calculation. Now we
:    are very confident that these categories are the ones which best suit to
:    our query.

FWIW: I've done this before in a custom hierarchical faceting component 
(to adjust the order used in displaying category drill down options)
and i found it worked very well, but the key is picking a good N.  If i 
remember correctly, i went with a percentage of the total result size, 
maxed out a fixed constant (which i also used as my docList window size, 
so getting those N docs was essentially free unless the user started 
drilling down deep in pagination).  But i also recall seeing a 
paper somewhere that talked about a similar idea and had an 
equation for finding a "cliff" in scores to identify where the "good" 
matches ended (the math confused me, but i think it was about looking at 
the delta in scores between successive documents compared to the delta of 
the last X docs? ... does this sound familiar to anybody else?)

:    - Finally we are firing a second query with top categories, calculated
:    above, in filter query (fq).

a) word of caution: when programaticly adding filters like this, make sure 
you give your users some visual feedback that it's happening, and some way 
to override the filter.  there is nothing more frustrating then having a 
search UI assume it knows what you want, and giving you know way to say 
"no reall, i wanted what i asked for".  A classic anoying as hell 
example was Yahoo's yellow page serach ~10 years ago.  if you typed in 
something that was the name of a "category" it would give you a listing of 
all businesses in that category in the city you specified.  Making it 
completley impossible to find a (furniture) store named "The Magazine" in 
berkeley -- because your search would automaticly be filtered to the 
category "Books & Magazines" with no way to break out.

b) instead of filtering, you might wnat to consider just just adding boost 
queries on the top categories -- it won't remove results, so if that's 
really what you want never mind, but it should have roughly the same 
effect on the first few pages of results, but people can still drill 
down to find those other documents if they wish.

: Does it require writing a plugin if I want to move above logic into Solr?
: Which component do I need to modify - QueryComponent?
: 
: Or is there any better or even equivalent method in Solr of doing this or
: similar thing?

you could subclass QueryComponent and use your subclass in place of 
QueryComponent, or you might consider just adding a new component in front 
of QueryComponent that does your initial query, looks at the results, and 
then modifies the filters and let's QueryComponent do it's normal work.

I'm not sure which one would be easier.

-Hoss

Reply via email to