This is doing exactly what it should. It'd be a little clearer if you
used a tokenSeparator other than the default space. Then this line:

text_shingles:word1 word2 word3+text_shingles:word4 word5

would look more like this:
text_shingles:word1_word2_word3+text_shingles:word4_word5

It's building a query from all of the 1, 2 and 3 grams. You're getting
the single tokens because outputUnigrams defaults to "true".

So of course as the number of terms in the query grows the number of
clauses int he parsed query grows non-linearly.

Best,
Erick

On Thu, Jul 26, 2018 at 12:44 PM, Jokin C <joki...@jokincuadrado.com> wrote:
> Hi, I have a problem and I don't know if it's something that am and doing
> wrong or if it's maybe a bug. I want to query a field with shingles, the
> field and type definition are this:
>
> <field name="text_shingles" type="text_en_shingles" indexed="true"
> stored="false"/>
>
> <fieldType name="text_en_shingles" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer >
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="3" />
>     </analyzer>
>   </fieldType>
>
>
> I'm using Solr  7.2.1.
>
> I jus wanted to have different min and max shingle sizes to test how ir
> works, but if the query is long solr is giving timeouts, high cpu and OOM.
>
> the query I'm using is this:
>
> http://localhost:8983/solr/ntnx/select?debugQuery=on&q={!edismax%20%20qf=%22text_shingles%22%20}%22%20word1%20word2%20word3%20word4%20word5%20word6%20word7
>
> and the parsed query grows like this with just 4 words, when I use a query
> with a lot of word it fails.
>
> 2 words:
> "parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
> +text_shingles:word2) text_shingles:word1 word2)))",
>
> 3words:
> "parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
> +text_shingles:word2 +text_shingles:word3) (+text_shingles:word1
> +text_shingles:word2 word3) (+text_shingles:word1 word2
> +text_shingles:word3) text_shingles:word1 word2 word3)))",
>
> 4 words:
> "parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
> +text_shingles:word2 +text_shingles:word3 +text_shingles:word4)
> (+text_shingles:word1 +text_shingles:word2 +text_shingles:word3 word4)
> (+text_shingles:word1 +text_shingles:word2 word3 +text_shingles:word4)
> (+text_shingles:word1 +text_shingles:word2 word3 word4)
> (+text_shingles:word1 word2 +text_shingles:word3 +text_shingles:word4)
> (+text_shingles:word1 word2 +text_shingles:word3 word4)
> (+text_shingles:word1 word2 word3 +text_shingles:word4))))",
>
> 5 words:
> "parsedquery":"+DisjunctionMaxQuery((((+text_shingles:word1
> +text_shingles:word2 +text_shingles:word3 +text_shingles:word4
> +text_shingles:word5) (+text_shingles:word1 +text_shingles:word2
> +text_shingles:word3 +text_shingles:word4 word5) (+text_shingles:word1
> +text_shingles:word2 +text_shingles:word3 word4 +text_shingles:word5)
> (+text_shingles:word1 +text_shingles:word2 +text_shingles:word3 word4
> word5) (+text_shingles:word1 +text_shingles:word2 word3
> +text_shingles:word4 +text_shingles:word5) (+text_shingles:word1
> +text_shingles:word2 word3 +text_shingles:word4 word5)
> (+text_shingles:word1 +text_shingles:word2 word3 word4
> +text_shingles:word5) (+text_shingles:word1 word2 +text_shingles:word3
> +text_shingles:word4 +text_shingles:word5) (+text_shingles:word1 word2
> +text_shingles:word3 +text_shingles:word4 word5) (+text_shingles:word1
> word2 +text_shingles:word3 word4 +text_shingles:word5)
> (+text_shingles:word1 word2 +text_shingles:word3 word4 word5)
> (+text_shingles:word1 word2 word3 +text_shingles:word4
> +text_shingles:word5) (+text_shingles:word1 word2 word3
> +text_shingles:word4 word5))))",
>
>
> So, something bad is happening, it's because I'm doing wrong or maybe its a
> bug and should I report on the team issue tracker?

Reply via email to