Re: [CAUTION] Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-22 Thread Shawn Heisey
On 3/22/2019 2:02 AM, Hubert-Price, Neil wrote: One other question Is there a system level configuration that can change the default for the sow= parameter? Can it be flipped to have the default set to true? Any parameter can be put into the query handler definition. In defaults, inva

Re: [CAUTION] Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-22 Thread Hubert-Price, Neil
One other question Is there a system level configuration that can change the default for the sow= parameter? Can it be flipped to have the default set to true? Many Thanks, Neil On 22/03/2019, 08:36, "Hubert-Price, Neil" wrote: Thanks Erick, that makes sense. However it do

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-22 Thread Hubert-Price, Neil
Thanks Erick, that makes sense. However it does lead me to another conclusion: in Solr prior to 6.0, or with sow=true on Solr 6.0+ that would mean that the ShingleFilter is totally ineffective within query analysers. It would be logically equivalent to not having the ShingleFilter configur

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Erick Erickson
sow was introduced in Solr 6, so it’s just ignored in 4x. bq. Surely the tokenizer splits on white space anyway, or it wouldn't work? I didn’t work on that code, so I don’t have the details off the top of my head, but I’ll take a stab at it as far as my understanding goes. The result is in your

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Hubert-Price, Neil
Hi Erick, I've run a series of tests using debug=true, the same original query, and variations around sow=true/sow=false/not set. See links below for .txt files containing the output. I have removed any genuine document content and replaced it with .. because I don't have the customer's p

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Erick Erickson
Neil: Yeah, the attachment-stripping is catches everyone first time, we’re so used to just adding anything we want to an e-mail… I don’t know enough about the query parsing to answer off the top of my head. I do know one thing that’s changed is “Split on Whitespace” has changed from true to fa

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Hubert-Price, Neil
Hello Erick, This is the first time I've had reason to use the mailing list, so I wasn't aware of the behaviour around attachments. See below, links to the images that I originally sent as attachments, both are screenshots from within Eclipse MAT looking at a SOLR heap dump. LargeQueryStructu

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-20 Thread Erick Erickson
The Apache mail server aggressively strips attachments, so yours didn’t come through. People often provide links to images stored somewhere else As to why this is behaving this way, I’m pretty clueless. A _complete_ shot in the dark is the query parsing changed its default for split on white

Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-20 Thread Hubert-Price, Neil
Hello All, We have a recently upgraded system that went from Solr 4.6 to Solr 7.1 (used as part of an ecommerce application). In the upgraded version we are seeing frequent issues with very high Solr memory usage for certain types of query, but the older 4.6 version does not produce the same r