: Searching on the* (assuming the is a stopword) will search on : (them OR theory OR thespian) assuming those three words are in : your index. It will NOT search on the. So I think you're OK, or are : you seeing anomalous results?
i think the missing pieces to hte puzzle here are: 1) wildcard and prefix queries aren't analyzed, so "the*" (or "für*") doesnt' get analyzed, and the system has no way of spoting that it's a stopword that should be removed from the query -- nor should it in general since the fact that "the" is a stpword doens't mean "the*" is an invalid query. I could very concievabley be trying to find words like "thespian" 2) by using the "AND" operator you are forcing both clauses to match... : > (für*) AND (solr OR solr*) ...so that query will only turn up results if a document containing a word that starts with "solr" and a word that starts with "für" existing in your index. : > The problem now is (as I think) that there is no "für*" anymore in the : > indexed data, because it was stopword filtered, too. If now someone the _word* "für" doesn't exist in your index because it's a stopword, but there may be other words in your index starting with the prefix "für" -- and if those words appear in documents that also contain words starting with "solr" then you will actually get matches. : > So, what should I do? Is there a better filter combination that I could : > try? Or am I doing something wrong conceptually? The only solution that I : > have found working is to not use stopword filtering at all. I would suggest that intstead of your existing approach of taking "word1 word2 word3 ..." and converting it to "(word1 OR word1*) AND (word2 OR word2*) ..." in the client, that you instead consider using multiple fields -- one "text" defined as you have it now, and one "text_prefix" defined similarly but with an additional EdgeNGramTokenFilter used when indexing to generate "prefix" tokens. then search those fields using dismax... q=word1 word2 word3 & qf=text text_prefix & mm=100% & tie=0 -Hoss
