Aaaaah, i completely forgot question mark is a single character wildcard!

Yes, yes, the word lengths are due to stemming. 

Can't believe i didn't think of it, but thanks for clearing up the fog!
Markus

-----Original message-----
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday 20th June 2018 16:15
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: 7.2.1 looking for ??????
> 
> You're confusing query parsing with the analysis chain processing.
> Before the query gets to the WDFF, the query _parser_ decides it's a
> wildcard query so it never gets passed through WDFF. If you escaped
> all the question marks, then you'd get what you expect.
> 
> If it weren't so, imagine what would happen if you passed the input
> through WDFF before figuring out it was a wildcard. You couldn't
> search with wildcards. That said, I'd expect your query to match every
> 6-letter term but that's not happening, perhaps stemming is giving you
> those longer-than-6-letter matches? In which case I can't explain the
> 5-letter match..... (Kinea)..
> 
> Best,
> Erick
> 
> On Wed, Jun 20, 2018 at 5:22 AM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > Hello,
> >
> > On the monitoring i spotted a query that that tooks over twenty seconds, 
> > instead of the usual 200 ms. It turned out to be a someone looking for 
> > question marks. I couldn't believe that would be a costly query, they 
> > should be removed by WordDelimiterFilter, and that is the case, that query 
> > analyzes to nothing, but yields results nonetheless, highlights keywords, 
> > and even a score, but that is before dumbing the query down to:
> >
> > ?select?q=content_nl:%3F%3F%3F%3F%3F%3F&hl=true&hl.fl=content_nl&debug=true
> >
> > So, what is going one here? Why does it get results, slightly less than 
> > there are documents with content_nl? Why are some terms highlighted? For 
> > example:
> >
> > <em>laatste</em> 10 <em>minuten</em> neemt Rodenburg <em>steeds</em> de 
> > <em>voorsprong</em>, maar elke keer <em>scoort</em> <em>Kinea</em> weer de 
> > <em>gelijkmaker</em>
> >
> > I have no idea why this happens.
> >
> > Many thanks,
> > Markus
> 

Reply via email to