There's every chance that I'm missing something at the Solr level, but it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not applying analysis to multiterms.
When I call this on 7.0.0: QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer); return qp.parse(qString); where the analyzer is a mock "uppercase vowel" analyzer[1] and the qString is; "the* quick~" the* quick~ the quick I get this: "the* quick~" name:the* name:quick~2 name:thE name:qUIck [1] https://github.com/tballison/lucene-addons/blob/master/lucene-5205/src/test/java/org/apache/lucene/queryparser/spans/TestAdvancedAnalyzers.java#L117 -----Original Message----- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, October 5, 2017 8:02 AM To: solr-user@lucene.apache.org Subject: RE: Complexphrase treats wildcards differently than other query parsers What version of Solr are you using? I thought this had been fixed fairly recently, but I can't quickly find the JIRA. Let me take a look. Best, Tim This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 [2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 -----Original Message----- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 6:28 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest > answer to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > <morten...@eluence.com> > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are > > multi-term aware. > > Furthermore, both lucene and dismax do the correct analysis, so I > > don't think you are right in your statement about the majority of > > QPs skipping analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > ><emir.arnauto...@sematext.com > >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis > >> chain but wildcard. The majority of query parsers do not analyse > >> query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > >> Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >>> <morten...@eluence.com> > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* In my > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning > >>> * > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis > >>> chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >