One other question .... Is there a system level configuration that can change the default for the sow= parameter? Can it be flipped to have the default set to true?
Many Thanks, Neil On 22/03/2019, 08:36, "Hubert-Price, Neil" <neil.hubert-pr...@sap.com> wrote: Thanks Erick, that makes sense. However it does lead me to another conclusion: in Solr prior to 6.0, or with sow=true on Solr 6.0+ .... that would mean that the ShingleFilter is totally ineffective within query analysers. It would be logically equivalent to not having the ShingleFilter configured at all. The point of the ShingleFilter as I understand it is to create combinations/permutations, but there are none possible surely if it receives only one pre-split token at a time. Going back to my original configuration, I think to achieve the same result as in Solr 4.6 - I would need to remove ShingleFilterFactory from the query analyser config for that field type? Many Thanks, Neil Sent from my iPhone > On 22 Mar 2019, at 02:38, Erick Erickson <erickerick...@gmail.com> wrote: > > sow was introduced in Solr 6, so it’s just ignored in 4x. > > bq. Surely the tokenizer splits on white space anyway, or it wouldn't work? > > I didn’t work on that code, so I don’t have the details off the top of my head, but I’ll take a stab at it as far as my understanding goes. The result is in your parsed queries. > > Note that in the better-behaved case, you have a bunch of individual tokens ORd together like: > productdetails_tokens_en:9611444530 > productdetails_tokens_en:9611444530 > > and that’s all. IOW, the query parser has split them into individual tokens that are fed one at a time into the analysis chain. > > In the bad case you have a bunch of single tokens as well, but then what look like multiple tokens, but are not: > +productdetails_tokens_en:9611444500 > +productdetails_tokens_en:9612194002 9612194002 9612194002) > > which is where the explosion is coming from. It’s deceptive, because when shingling, this is a single token "9612194002 9612194002 9612194002” for all it looks like something that’d be split by whitespace. > > If you take a look at your admin UI>>your_core>>schema and select your productdetails_tokens_en from the drop down and then “load terms” you’ll see. If you want to experiment, you can add a tokenSeparator character other than a space to the shinglefilter that’ll make it clearer. Then the clause above that looks like multiple, whitespace-separated tokens would look like what it really is, a single token: > > +productdetails_tokens_en:9612194002_9612194002_9612194002) > > Best, > Erick > >> On Mar 21, 2019, at 3:10 PM, Hubert-Price, Neil <neil.hubert-pr...@sap.com> wrote: >> >> Surely the tokenizer splits on white space anyway, or it wouldn't work? >