Aaaaah, i completely forgot question mark is a single character wildcard! Yes, yes, the word lengths are due to stemming.
Can't believe i didn't think of it, but thanks for clearing up the fog! Markus -----Original message----- > From:Erick Erickson <erickerick...@gmail.com> > Sent: Wednesday 20th June 2018 16:15 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: 7.2.1 looking for ?????? > > You're confusing query parsing with the analysis chain processing. > Before the query gets to the WDFF, the query _parser_ decides it's a > wildcard query so it never gets passed through WDFF. If you escaped > all the question marks, then you'd get what you expect. > > If it weren't so, imagine what would happen if you passed the input > through WDFF before figuring out it was a wildcard. You couldn't > search with wildcards. That said, I'd expect your query to match every > 6-letter term but that's not happening, perhaps stemming is giving you > those longer-than-6-letter matches? In which case I can't explain the > 5-letter match..... (Kinea).. > > Best, > Erick > > On Wed, Jun 20, 2018 at 5:22 AM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > Hello, > > > > On the monitoring i spotted a query that that tooks over twenty seconds, > > instead of the usual 200 ms. It turned out to be a someone looking for > > question marks. I couldn't believe that would be a costly query, they > > should be removed by WordDelimiterFilter, and that is the case, that query > > analyzes to nothing, but yields results nonetheless, highlights keywords, > > and even a score, but that is before dumbing the query down to: > > > > ?select?q=content_nl:%3F%3F%3F%3F%3F%3F&hl=true&hl.fl=content_nl&debug=true > > > > So, what is going one here? Why does it get results, slightly less than > > there are documents with content_nl? Why are some terms highlighted? For > > example: > > > > <em>laatste</em> 10 <em>minuten</em> neemt Rodenburg <em>steeds</em> de > > <em>voorsprong</em>, maar elke keer <em>scoort</em> <em>Kinea</em> weer de > > <em>gelijkmaker</em> > > > > I have no idea why this happens. > > > > Many thanks, > > Markus >