Re: autoGeneratePhraseQueries does not work?

Alexandros Paramythis Thu, 25 Jul 2019 02:13:09 -0700

Hi Akihiro,

This behavior is caused by the NGramTokenizerFactory, which is used bothduring analysis and querying in the configuration.


In short:

* During indexing, this filter causes the token 'aaa' to be generatedfrom the string 'aaabbbccc' (because minGramSize="3") and indexed* At query time, this filter causes the token 'aaa' to be generated fromthe string "aaaa" (again, because minGramSize="3")

Therefore, the matching should indeed occur exactly as you are seeing ithappen.

You may find the second example in the N-Gram Tokenizer section of thereference guide useful:https://lucene.apache.org/solr/guide/7_7/tokenizers.html#n-gram-tokenizer


Hope this helps,

Alex


On 25/07/2019 08:24, Akihiro Ito wrote:

HI,

I am using Solr 7.7.1 in SolrCloud mode.

I’m getting a document I shouldn’t when searching with a TextField.
It looks like autoGeneratePhaseQuery is not working as it should,
but I have no idea what is causing it.

The schema definition I use is as follows.

  <fieldType name=“trigram_type” class=“solr.TextField”
positionIncrementGap=“100” autoGeneratePhraseQueries=“true”>
    <analyzer type=“index”>
      <charFilter class=“solr.ICUNormalizer2CharFilterFactory” name=“nfkc”/>
      <charFilter class=“solr.PatternReplaceCharFilterFactory” pattern=“$”
replacement=“**“/>
      <tokenizer class=“solr.NGramTokenizerFactory” maxGramSize=“3"
minGramSize=“3” />
      <filter class=“solr.PatternReplaceFilterFactory”
pattern=“([^\s])\s[^\s]” replacement=“$1  ” replace=“all”/>
      <filter class=“solr.PatternReplaceFilterFactory” pattern=“^\s.*$”
replacement=“” replace=“all”/>
      <filter class=“solr.LengthFilterFactory” min=“3” max=“3"/>
      <filter class=“solr.LowerCaseFilterFactory”/>
      <filter class=“solr.ICUTransformFilterFactory” id=“Hiragana-Katakana”/>
    </analyzer>
    <analyzer type=“query”>
      <charFilter class=“solr.ICUNormalizer2CharFilterFactory” name=“nfkc”/>
      <tokenizer class=“solr.NGramTokenizerFactory” maxGramSize=“3"
minGramSize=“3” />
      <filter class=“solr.LowerCaseFilterFactory”/>

<filter class=“solr.ICUTransformFilterFactory” id=“Hiragana-Katakana”/>

    </analyzer>
  </fieldType>


Following sample document is in Solr.

docs: [
{
  syo_id: “1237”,
  trigram: “aaabbbccc”,
  ＿version＿: 1639992506850476000,
  timestamp: “2019-07-25T01:38:52.894Z”
}
]

If I execute the following query,it will hit the above document

q=trigram:aaaa&fq=syo_id:1237&debugQuery=on


Thanks,
Akihiro.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: autoGeneratePhraseQueries does not work?

Reply via email to