Re: Prefix-Search with Stopwords - no results?

Chris Hostetter Fri, 28 May 2010 13:06:34 -0700

: Searching on the* (assuming the is a stopword) will search on
: (them OR theory OR thespian) assuming those three words are in
: your index. It will NOT search on the. So I think you're OK, or are
: you seeing anomalous results?


i think the missing pieces to hte puzzle here are:

1) wildcard and prefix queries aren't analyzed, so "the*" (or "für*") 
doesnt' get analyzed, and the system has no way of spoting that it's a 
stopword that should be removed from the query -- nor should it in general 
since the fact that "the" is a stpword doens't mean "the*" is an invalid 
query.  I could very concievabley be trying to find words like "thespian"

2) by using the "AND" operator you are forcing both clauses to match...

: >  (für*) AND (solr OR solr*)

...so that query will only turn up results if a document containing a word 
that starts with "solr" and a word that starts with "für" existing in your 
index.

: > The problem now is (as I think) that there is no "für*" anymore in the
: > indexed data, because it was stopword filtered, too. If now someone

the _word* "für" doesn't exist in your index because it's a stopword, but 
there may be other words in your index starting with the prefix "für" -- 
and if those words appear in documents that also contain words starting 
with "solr" then you will actually get matches.

: > So, what should I do? Is there a better filter combination that I could
: > try? Or am I doing something wrong conceptually? The only solution that I
: > have found working is to not use stopword filtering at all.


I would suggest that intstead of your existing approach of taking "word1 
word2 word3 ..." and converting it to "(word1 OR word1*) AND (word2 OR 
word2*) ..." in the client, that you instead consider using multiple 
fields -- one "text" defined as you have it now, and one "text_prefix" 
defined similarly but with an additional EdgeNGramTokenFilter used when 
indexing to generate "prefix" tokens. then search those fields using 
dismax...

q=word1 word2 word3 & qf=text text_prefix & mm=100% & tie=0



-Hoss

Re: Prefix-Search with Stopwords - no results?

Reply via email to