Prefix-Search with Stopwords - no results?

Gert Brinkmann Fri, 28 May 2010 08:26:15 -0700


Hello,

I am having some problems with solr 1.4. I am indexing and querying datausing the following fieldType:

    <fieldType name="text_de_de" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_de_de.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LengthFilterFactory" min="2" max="200"/>
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms_de_de.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory"
          ignoreCase="true"
          words="stopwords_de_de.txt"
                enablePositionIncrements="true"
          />
        <filter class="solr.LengthFilterFactory" min="2" max="200"/>
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

The application that is using solr does prepare the search string tofilter out some dangerous characters like brackets and wildcards, etc,that otherwise might lead to a wrong query syntax.

All words are searched for as a normal word as well as a prefix. E.g.:"für solr" is converted by the application to

  (für OR für*) AND (solr OR solr*)

This works fine for normal words. But if you have a stopword like "für"in this example, the query will be stopword filtered by solr tosomething like this:

  (für*) AND (solr OR solr*)

The problem now is (as I think) that there is no "für*" anymore in theindexed data, because it was stopword filtered, too. If now someonecopy&pastes a sentence from an indexed document that contains astopword, this document will not be found by solr.

The enablePositionIncrements="true" only is (AFAIU) for queryingphrases, but not for my case of "word OR word*" queries.

So, what should I do? Is there a better filter combination that I couldtry? Or am I doing something wrong conceptually? The only solution thatI have found working is to not use stopword filtering at all.


Greetings,
Gert

Prefix-Search with Stopwords - no results?

Reply via email to