Hi, I have a problem and I don't know if it's something that am and doing
wrong or if it's maybe a bug. I want to query a field with shingles, the
field and type definition are this:

<field name="text_shingles" type="text_en_shingles" indexed="true"
stored="false"/>

<fieldType name="text_en_shingles" class="solr.TextField"
positionIncrementGap="100">
    <analyzer >
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
maxShingleSize="3" />
    </analyzer>
  </fieldType>


I'm using Solr  7.2.1.

I jus wanted to have different min and max shingle sizes to test how ir
works, but if the query is long solr is giving timeouts, high cpu and OOM.

the query I'm using is this:

http://localhost:8983/solr/ntnx/select?debugQuery=on&q={!edismax%20%20qf=%22text_shingles%22%20}%22%20word1%20word2%20word3%20word4%20word5%20word6%20word7

and the parsed query grows like this with just 4 words, when I use a query
with a lot of word it fails.

2 words:
"parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
+text_shingles:word2) text_shingles:word1 word2)))",

3words:
"parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
+text_shingles:word2 +text_shingles:word3) (+text_shingles:word1
+text_shingles:word2 word3) (+text_shingles:word1 word2
+text_shingles:word3) text_shingles:word1 word2 word3)))",

4 words:
"parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
+text_shingles:word2 +text_shingles:word3 +text_shingles:word4)
(+text_shingles:word1 +text_shingles:word2 +text_shingles:word3 word4)
(+text_shingles:word1 +text_shingles:word2 word3 +text_shingles:word4)
(+text_shingles:word1 +text_shingles:word2 word3 word4)
(+text_shingles:word1 word2 +text_shingles:word3 +text_shingles:word4)
(+text_shingles:word1 word2 +text_shingles:word3 word4)
(+text_shingles:word1 word2 word3 +text_shingles:word4))))",

5 words:
"parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
+text_shingles:word2 +text_shingles:word3 +text_shingles:word4
+text_shingles:word5) (+text_shingles:word1 +text_shingles:word2
+text_shingles:word3 +text_shingles:word4 word5) (+text_shingles:word1
+text_shingles:word2 +text_shingles:word3 word4 +text_shingles:word5)
(+text_shingles:word1 +text_shingles:word2 +text_shingles:word3 word4
word5) (+text_shingles:word1 +text_shingles:word2 word3
+text_shingles:word4 +text_shingles:word5) (+text_shingles:word1
+text_shingles:word2 word3 +text_shingles:word4 word5)
(+text_shingles:word1 +text_shingles:word2 word3 word4
+text_shingles:word5) (+text_shingles:word1 word2 +text_shingles:word3
+text_shingles:word4 +text_shingles:word5) (+text_shingles:word1 word2
+text_shingles:word3 +text_shingles:word4 word5) (+text_shingles:word1
word2 +text_shingles:word3 word4 +text_shingles:word5)
(+text_shingles:word1 word2 +text_shingles:word3 word4 word5)
(+text_shingles:word1 word2 word3 +text_shingles:word4
+text_shingles:word5) (+text_shingles:word1 word2 word3
+text_shingles:word4 word5))))",


So, something bad is happening, it's because I'm doing wrong or maybe its a
bug and should I report on the team issue tracker?

Reply via email to