term position question from analyzer stack for WordDelimiterFilterFactory

Robert Petersen Thu, 21 Apr 2011 17:07:28 -0700

So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings 
I cannot get a match between AppleTV on the indexing side and appletv on the 
search side.  Without that setting the all lowercase version of AppleTV is in 
term position two due to the catenateWords=1 or the catenateAll=1 settings.  I 
am surprised.  How does term position affect searching?  Here is my analysis 
with preserveOriginal=1 to make the lower case occur in both term position 1 
and 2:


Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text       AppleTV
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.SynonymFilterFactory {synonyms=index_synonyms.txt, 
expand=true, ignoreCase=true}
term position   1
term text       AppleTV
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, 
ignoreCase=true}
term position   1
term text       AppleTV
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, 
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, 
catenateNumbers=1}
term position   1       2
term text       AppleTV TV
                Apple           AppleTV
term type       word            word
word    word
source start,end        0,7     5,7
0,5     0,7
payload                 
        
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1       2
term text       appletv tv
                apple           appletv
term type       word            word
word    word
source start,end        0,7     5,7
0,5     0,7
payload                 
        
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
{protected=protwords.txt}
term position   1       2
term text       appletv tv
                apple           appletv
term type       word    word
word    word
source start,end        0,7     5,7
0,5     0,7
payload                 
        
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1       2
term text       appletv tv
                apple           appletv
term type       word            word
word    word
source start,end        0,7     5,7
0,5     0,7
payload                 
        
Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.SynonymFilterFactory {synonyms=query_synonyms.txt, 
expand=true, ignoreCase=true}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, 
ignoreCase=true}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=1, 
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, 
catenateNumbers=1}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload         
com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
{protected=protwords.txt}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload         
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1
term text       appletv
term type       word
source start,end        0,7
payload

term position question from analyzer stack for WordDelimiterFilterFactory

Reply via email to