On Fri, Apr 3, 2009 at 9:56 AM, Dean Missikowski (Consultant), CLSA <
dean.missikow...@clsa.com> wrote:

> I've got a problem that's driving me crazy with parentheses.
>
> I'm using a recent nightly Solr 1.4
>
> My index includes these three docs.
>
> doc #1 has title: "saints & sinners"
>
> doc #2 has title: "(saints and sinners)"
>
> doc #3 has title: "( saints & sinners )"
>
> doc #4 has title: "(saints & sinners)"
>
>
> when I try any of these searches:
>
>  title:saints & sinners
>
>  title:"saints & sinners"
>
>  title:saints and sinners
>
>
>
> Only docs  #1-3 are found, but doc #4 should match too?
>
>
>
> The analyzer shows that the tokenizer and filters should find a match.
>
> I'm guessing this might be a bug with WordDelimiterFactory?
>
>
I just tried indexing "(saints & sinners)" into a field and tried searching
by "saints & sinners" and I got a match.

The type definition in schema.xml that I used for testing was:

<fieldtype name="title" class="solr.TextField" multiValued="true"
positionIncrementGap="100">
      <analyzer type="index">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" splitOnNumerics="0" splitOnCaseChange="0"
generateWordParts="1" generateNumberParts="0" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
          <filter class="solr.StopFilterFactory"/>
          <filter class="solr.EnglishPorterFilterFactory"/>
      </analyzer>
      <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" splitOnNumerics="0" splitOnCaseChange="0"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
          <filter class="solr.StopFilterFactory"/>
          <filter class="solr.EnglishPorterFilterFactory"/>
      </analyzer>
    </fieldtype>

-- 
Regards,
Shalin Shekhar Mangar.
  • crazy parentheses Dean Missikowski (Consultant), CLSA
    • Re: crazy parentheses Shalin Shekhar Mangar

Reply via email to