strange results with query and hyphened words

Markus.Rietzler Fri, 28 May 2010 07:55:17 -0700

i am wondering why a search term with hyphen doesn't match.

my search term is "prof-auskunft". in WordDelimiterFilterFactory i have
catenateWords, so my understanding is that profi-auskunft would search
for profiauskunft. when i use the analyse panel in solr admi i see that
profi-auskunft matches a term "profiauskunft".


the analyse will show

Query Analyzer
WhitespaceTokenizerFactory 
        profi-auskunft
SynonymFilterFactory 
        profi-auskunft
StopFilterFactory 
        profi-auskunft

WordDelimiterFilterFactory 

term position   1       2
term text               profi   auskunft
                                profiauskunft
term type               word    word
                                word
source start,end        0,5     6,14
                                0,15

LowerCaseFilterFactory 
SnowballPorterFilterFactory 

why is auskunft and profiauskunft in one column. how do they get
searched?

when i search "profiauskunft" i have 230 hits, when i now search for
"profi-auskunft" i do get less hits. when i call the search with
debugQuery=on i see 

body:"profi (auskunft profiauskunft)"

what does this query mean? profi and "auskunft or profiauskunft"? 




<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- sg324 bei wortern die durch - und weitere leerzeichen
getrennt sind, werden diese zusammengefuehrt. -->
        <filter class="solr.HiphenatedWordsFilterFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms_de.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="de/stopwords_de.txt"
                enablePositionIncrements="true"
                />
        <!-- sg324 -->
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="German" protected="de/protwords_de.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="de/synonyms_de.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="de/stopwords_de.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="German" protected="de/protwords_de.txt"/>
      </analyzer>
</fieldType>

strange results with query and hyphened words

Reply via email to