AW: strange results with query and hyphened words

Markus.Rietzler Mon, 31 May 2010 03:01:57 -0700

i am not very sure, whether this helps me. 

i see the point, that there will be problems.


but

the default-config for index is:

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1"/>

and for query:

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"/> 


with this settings i don't find "profiauskunft" when searching for  
"profi-auskunft" (analyse0.jpg)

if i use "catenateWords="1""
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0"/>

analysis.jsp says that there is a match (analyse1.jpg).


but in our life search "profi-auskunft" won't match "profiaukunft", only finds 
"profi-auskunft".
could anyone please clearify the output of analysis.jsp for me.
why is there a highlight in anylises.jsp but not a match when doing a search. 
even from the admin panel


when i have

profi  auskunft
       profiauskunft

does this mean "profi (auskunft profiauksunft)" will match the word "profi" 
follewed by "auskunft" or "profiauksunft".
is this OR the same as i configure with defaultOperator in solrQueryParser-tag?
the "OR"-thing does only apply to the query-part, right? what will that mean in 
the index part? 


> -----Ursprüngliche Nachricht-----
> Von: Sascha Szott [mailto:sz...@zib.de] 
> Gesendet: Sonntag, 30. Mai 2010 19:01
> An: solr-user@lucene.apache.org
> Betreff: Re: strange results with query and hyphened words
> 
> Hi Markus,
> 
> I was facing the same problem a few days ago and found an 
> explanation in 
> the mail archive that clarifies my question regarding the usage of 
> Solr's WordDelimiterFilterFactory:
> 
> http://markmail.org/message/qoby6kneedtwd42h
> 
> Best,
> Sascha
> 
> markus.rietz...@rzf.fin-nrw.de wrote:
> > i am wondering why a search term with hyphen doesn't match.
> >
> > my search term is "prof-auskunft". in 
> WordDelimiterFilterFactory i have
> > catenateWords, so my understanding is that profi-auskunft 
> would search
> > for profiauskunft. when i use the analyse panel in solr 
> admi i see that
> > profi-auskunft matches a term "profiauskunft".
> >
> > the analyse will show
> >
> > Query Analyzer
> > WhitespaceTokenizerFactory
> >     profi-auskunft
> > SynonymFilterFactory
> >     profi-auskunft
> > StopFilterFactory
> >     profi-auskunft
> >
> > WordDelimiterFilterFactory
> >
> > term position       1       2
> > term text           profi   auskunft
> >                             profiauskunft
> > term type           word    word
> >                             word
> > source start,end    0,5     6,14
> >                             0,15
> >
> > LowerCaseFilterFactory
> > SnowballPorterFilterFactory
> >
> > why is auskunft and profiauskunft in one column. how do they get
> > searched?
> >
> > when i search "profiauskunft" i have 230 hits, when i now search for
> > "profi-auskunft" i do get less hits. when i call the search with
> > debugQuery=on i see
> >
> > body:"profi (auskunft profiauskunft)"
> >
> > what does this query mean? profi and "auskunft or profiauskunft"?
> >
> >
> >
> >
> > <fieldType name="text_de" class="solr.TextField"
> > positionIncrementGap="100">
> >        <analyzer type="index">
> >          <charFilter class="solr.HTMLStripCharFilterFactory" />
> >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >          <!-- sg324 bei wortern die durch - und weitere leerzeichen
> > getrennt sind, werden diese zusammengefuehrt. -->
> >          <filter class="solr.HiphenatedWordsFilterFactory"/>
> >          <!-- in this example, we will only use synonyms at 
> query time
> >          <filter class="solr.SynonymFilterFactory"
> > synonyms="index_synonyms_de.txt" ignoreCase="true" expand="false"/>
> >          -->
> >          <!-- Case insensitive stop word removal.
> >            add enablePositionIncrements=true in both the 
> index and query
> >            analyzers to leave a 'gap' for more accurate 
> phrase queries.
> >          -->
> >          <filter class="solr.StopFilterFactory"
> >                  ignoreCase="true"
> >                  words="de/stopwords_de.txt"
> >                  enablePositionIncrements="true"
> >                  />
> >          <!-- sg324 -->
> >          <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >          <filter class="solr.LowerCaseFilterFactory"/>
> >          <filter class="solr.SnowballPorterFilterFactory"
> > language="German" protected="de/protwords_de.txt"/>
> >        </analyzer>
> >        <analyzer type="query">
> >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >          <filter class="solr.SynonymFilterFactory"
> > synonyms="de/synonyms_de.txt" ignoreCase="true" expand="true"/>
> >          <filter class="solr.StopFilterFactory"
> >                  ignoreCase="true"
> >                  words="de/stopwords_de.txt"
> >                  enablePositionIncrements="true"
> >                  />
> >          <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >          <filter class="solr.LowerCaseFilterFactory"/>
> >          <filter class="solr.SnowballPorterFilterFactory"
> > language="German" protected="de/protwords_de.txt"/>
> >        </analyzer>
> > </fieldType>
> >
> >
> 
>

AW: strange results with query and hyphened words

Reply via email to