Hi,

I have a question regarding phrase search in combination with a WordDelimiterGraphFilter (Solr 8.4.1).

Whenever I try to search using a phrase where token combination consists of delimited and non-delimited tokens, I don't get any matches.

This is the configuration:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.WordDelimiterGraphFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.FlattenGraphFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

<field name="text" type="text" indexed="true" stored="true" 
omitTermFreqAndPositions="false" />


Example document:

{
  id: '1',
  text: 'mr. i.n.i.t. firstsirname secondsirname'
}

Queries and results:

Query:
"mr. i.n.i.t. firstsirname"
-----
No result

Query:
"mr. i.n.i.t."
-----
Result

Query:
"mr. i n i t"
-----
Result

Query:
"mr. init"
-----
Result

Query:
"mr init"
-----
Result

Query:
"i.n.i.t. firstsirname"
-----
No result

Query:
"init firstsirname"
-----
No result

Query:
"i.n.i.t. firstsirname secondsirname"
-----
No result

Query:
"init firstsirname secondsirname"
-----
No result


I don't quite understand why this is. When looking at the results of the analyzers I don't understand why it's working with just delimited or non-delimited tokens. However, as soon as the mixed combination of delimited and non-delimited is searched, there is no match.

Could someone explain? And is there a solution to make it work?

Best regards,

Jeroen


Reply via email to