You're confusing query parsing with the analysis chain processing.
Before the query gets to the WDFF, the query _parser_ decides it's a
wildcard query so it never gets passed through WDFF. If you escaped
all the question marks, then you'd get what you expect.

If it weren't so, imagine what would happen if you passed the input
through WDFF before figuring out it was a wildcard. You couldn't
search with wildcards. That said, I'd expect your query to match every
6-letter term but that's not happening, perhaps stemming is giving you
those longer-than-6-letter matches? In which case I can't explain the
5-letter match..... (Kinea)..

Best,
Erick

On Wed, Jun 20, 2018 at 5:22 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> Hello,
>
> On the monitoring i spotted a query that that tooks over twenty seconds, 
> instead of the usual 200 ms. It turned out to be a someone looking for 
> question marks. I couldn't believe that would be a costly query, they should 
> be removed by WordDelimiterFilter, and that is the case, that query analyzes 
> to nothing, but yields results nonetheless, highlights keywords, and even a 
> score, but that is before dumbing the query down to:
>
> ?select?q=content_nl:%3F%3F%3F%3F%3F%3F&hl=true&hl.fl=content_nl&debug=true
>
> So, what is going one here? Why does it get results, slightly less than there 
> are documents with content_nl? Why are some terms highlighted? For example:
>
> <em>laatste</em> 10 <em>minuten</em> neemt Rodenburg <em>steeds</em> de 
> <em>voorsprong</em>, maar elke keer <em>scoort</em> <em>Kinea</em> weer de 
> <em>gelijkmaker</em>
>
> I have no idea why this happens.
>
> Many thanks,
> Markus

Reply via email to