I've got a problem that's driving me crazy with parentheses.

 

I'm using a recent nightly Solr 1.4

 

My index includes these three docs. 

 

doc #1 has title: "saints & sinners"

doc #2 has title: "(saints and sinners)"

doc #3 has title: "( saints & sinners )"

doc #4 has title: "(saints & sinners)"

 

when I try any of these searches:

  title:saints & sinners 

  title:"saints & sinners"

  title:saints and sinners

 

Only docs  #1-3 are found, but doc #4 should match too?

 

The analyzer shows that the tokenizer and filters should find a match.  

I'm guessing this might be a bug with WordDelimiterFactory?

 

I've worked around by using a PatternReplaceFilterFactory to strip off
the parentheses.

<filter class="solr.PatternReplaceFilterFactory" pattern="[()]"
replacement="" replace="all"/>

 

Any ideas?

 

Thanks, Dean

 

 

Index Analyzer

org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position     1          2          3

term text           (saints  &          sinners)

term type          word     word     word

source start,end             0,7        8,9        10,18

payload                                     

org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1,
catenateNumbers=1, catenateAll=0, generateNumberParts=1,
generateWordParts=1}

term position     1          3

term text           saints   sinners

term type          word     word

source start,end             1,7        10,17

payload                         

org.apache.solr.analysis.LowerCaseFilterFactory {}

term position     1          3

term text           saints   sinners

term type          word     word

source start,end             1,7        10,17

payload                         

org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position     1          3

term text           saints   sinners

term type          word     word

source start,end             1,7        10,17

payload                         

org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}

term position     1          3

term text           saint     sinner

term type          word     word

source start,end             1,7        10,17

payload                         

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position     1          3

term text           saint     sinner

term type          word     word

source start,end             1,7        10,17

payload                         

Query Analyzer

org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position     1          2          3

term text           saints   &          sinners

term type          word     word     word

source start,end             0,6        7,8        9,16

payload                                     

org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1,
catenateNumbers=1, catenateAll=0, generateNumberParts=1,
generateWordParts=1}

term position     1          2

term text           saints   sinners

term type          word     word

source start,end             0,6        9,16

payload                         

org.apache.solr.analysis.LowerCaseFilterFactory {}

term position     1          2

term text           saints   sinners

term type          word     word

source start,end             0,6        9,16

payload                         

org.apache.solr.analysis.SynonymFilterFactory {expand=true,
ignoreCase=true, synonyms=synonyms.txt}

term position     1          2

term text           saints   sinners

term type          word     word

source start,end             0,6        9,16

payload                         

org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position     1          2

term text           saints   sinners

term type          word     word

source start,end             0,6        9,16

payload                         

org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}

term position     1          2

term text           saint     sinner

term type          word     word

source start,end             0,6        9,16

payload                         

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position     1          2

term text           saint     sinner

term type          word     word

source start,end             0,6        9,16

payload                         


CLSA CLEAN & GREEN: Please consider our environment before printing this email.
The content of this communication is subject to CLSA Legal and Regulatory 
Notices. 
These can be viewed at https://www.clsa.com/disclaimer.html or sent to you upon 
request.


Reply via email to