On Fri, Apr 3, 2009 at 9:56 AM, Dean Missikowski (Consultant), CLSA < dean.missikow...@clsa.com> wrote:
> I've got a problem that's driving me crazy with parentheses. > > I'm using a recent nightly Solr 1.4 > > My index includes these three docs. > > doc #1 has title: "saints & sinners" > > doc #2 has title: "(saints and sinners)" > > doc #3 has title: "( saints & sinners )" > > doc #4 has title: "(saints & sinners)" > > > when I try any of these searches: > > title:saints & sinners > > title:"saints & sinners" > > title:saints and sinners > > > > Only docs #1-3 are found, but doc #4 should match too? > > > > The analyzer shows that the tokenizer and filters should find a match. > > I'm guessing this might be a bug with WordDelimiterFactory? > > I just tried indexing "(saints & sinners)" into a field and tried searching by "saints & sinners" and I got a match. The type definition in schema.xml that I used for testing was: <fieldtype name="title" class="solr.TextField" multiValued="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" splitOnNumerics="0" splitOnCaseChange="0" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" splitOnNumerics="0" splitOnCaseChange="0" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer> </fieldtype> -- Regards, Shalin Shekhar Mangar.