sow was introduced in Solr 6, so it’s just ignored in 4x. bq. Surely the tokenizer splits on white space anyway, or it wouldn't work?
I didn’t work on that code, so I don’t have the details off the top of my head, but I’ll take a stab at it as far as my understanding goes. The result is in your parsed queries. Note that in the better-behaved case, you have a bunch of individual tokens ORd together like: productdetails_tokens_en:9611444530 productdetails_tokens_en:9611444530 and that’s all. IOW, the query parser has split them into individual tokens that are fed one at a time into the analysis chain. In the bad case you have a bunch of single tokens as well, but then what look like multiple tokens, but are not: +productdetails_tokens_en:9611444500 +productdetails_tokens_en:9612194002 9612194002 9612194002) which is where the explosion is coming from. It’s deceptive, because when shingling, this is a single token "9612194002 9612194002 9612194002” for all it looks like something that’d be split by whitespace. If you take a look at your admin UI>>your_core>>schema and select your productdetails_tokens_en from the drop down and then “load terms” you’ll see. If you want to experiment, you can add a tokenSeparator character other than a space to the shinglefilter that’ll make it clearer. Then the clause above that looks like multiple, whitespace-separated tokens would look like what it really is, a single token: +productdetails_tokens_en:9612194002_9612194002_9612194002) Best, Erick > On Mar 21, 2019, at 3:10 PM, Hubert-Price, Neil <neil.hubert-pr...@sap.com> > wrote: > > Surely the tokenizer splits on white space anyway, or it wouldn't work?