i am not very sure, whether this helps me. i see the point, that there will be problems.
but the default-config for index is: <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> and for query: <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"/> with this settings i don't find "profiauskunft" when searching for "profi-auskunft" (analyse0.jpg) if i use "catenateWords="1"" <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0"/> analysis.jsp says that there is a match (analyse1.jpg). but in our life search "profi-auskunft" won't match "profiaukunft", only finds "profi-auskunft". could anyone please clearify the output of analysis.jsp for me. why is there a highlight in anylises.jsp but not a match when doing a search. even from the admin panel when i have profi auskunft profiauskunft does this mean "profi (auskunft profiauksunft)" will match the word "profi" follewed by "auskunft" or "profiauksunft". is this OR the same as i configure with defaultOperator in solrQueryParser-tag? the "OR"-thing does only apply to the query-part, right? what will that mean in the index part? > -----Ursprüngliche Nachricht----- > Von: Sascha Szott [mailto:sz...@zib.de] > Gesendet: Sonntag, 30. Mai 2010 19:01 > An: solr-user@lucene.apache.org > Betreff: Re: strange results with query and hyphened words > > Hi Markus, > > I was facing the same problem a few days ago and found an > explanation in > the mail archive that clarifies my question regarding the usage of > Solr's WordDelimiterFilterFactory: > > http://markmail.org/message/qoby6kneedtwd42h > > Best, > Sascha > > markus.rietz...@rzf.fin-nrw.de wrote: > > i am wondering why a search term with hyphen doesn't match. > > > > my search term is "prof-auskunft". in > WordDelimiterFilterFactory i have > > catenateWords, so my understanding is that profi-auskunft > would search > > for profiauskunft. when i use the analyse panel in solr > admi i see that > > profi-auskunft matches a term "profiauskunft". > > > > the analyse will show > > > > Query Analyzer > > WhitespaceTokenizerFactory > > profi-auskunft > > SynonymFilterFactory > > profi-auskunft > > StopFilterFactory > > profi-auskunft > > > > WordDelimiterFilterFactory > > > > term position 1 2 > > term text profi auskunft > > profiauskunft > > term type word word > > word > > source start,end 0,5 6,14 > > 0,15 > > > > LowerCaseFilterFactory > > SnowballPorterFilterFactory > > > > why is auskunft and profiauskunft in one column. how do they get > > searched? > > > > when i search "profiauskunft" i have 230 hits, when i now search for > > "profi-auskunft" i do get less hits. when i call the search with > > debugQuery=on i see > > > > body:"profi (auskunft profiauskunft)" > > > > what does this query mean? profi and "auskunft or profiauskunft"? > > > > > > > > > > <fieldType name="text_de" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <charFilter class="solr.HTMLStripCharFilterFactory" /> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <!-- sg324 bei wortern die durch - und weitere leerzeichen > > getrennt sind, werden diese zusammengefuehrt. --> > > <filter class="solr.HiphenatedWordsFilterFactory"/> > > <!-- in this example, we will only use synonyms at > query time > > <filter class="solr.SynonymFilterFactory" > > synonyms="index_synonyms_de.txt" ignoreCase="true" expand="false"/> > > --> > > <!-- Case insensitive stop word removal. > > add enablePositionIncrements=true in both the > index and query > > analyzers to leave a 'gap' for more accurate > phrase queries. > > --> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="de/stopwords_de.txt" > > enablePositionIncrements="true" > > /> > > <!-- sg324 --> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > > language="German" protected="de/protwords_de.txt"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" > > synonyms="de/synonyms_de.txt" ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="de/stopwords_de.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > > language="German" protected="de/protwords_de.txt"/> > > </analyzer> > > </fieldType> > > > > > >