Thanks Tim, that might be what I'm experiencing. I'm actually quite certain of it :-)
Do you remember any reason that multi term analysis is not happening in ComplexPhraseQueryParser? I'm on 6.6.1, so latest on the 6.x branch. 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>: > There's every chance that I'm missing something at the Solr level, but it > _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not > applying analysis to multiterms. > > When I call this on 7.0.0: > QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck > > > [1] https://github.com/tballison/lucene-addons/blob/master/ > lucene-5205/src/test/java/org/apache/lucene/queryparser/ > spans/TestAdvancedAnalyzers.java#L117 > > -----Original Message----- > From: Allison, Timothy B. [mailto:talli...@mitre.org] > Sent: Thursday, October 5, 2017 8:02 AM > To: solr-user@lucene.apache.org > Subject: RE: Complexphrase treats wildcards differently than other query > parsers > > What version of Solr are you using? > > I thought this had been fixed fairly recently, but I can't quickly find > the JIRA. Let me take a look. > > Best, > > Tim > > This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] > and [2], which handles analysis of multiterms even in phrases. > > [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 > [2] https://mvnrepository.com/artifact/org.tallison.lucene/ > lucene-5205/6.6-0.1 > > -----Original Message----- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Thursday, October 5, 2017 6:28 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other query > parsers > > 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > > > Hi Bjarke, > > You are right - I jumped into wrong/old conclusion as the simplest > > answer to your question. > > > No problem :-) > > I guess looking at the code could give you an answer. > > > > This is what I would like to avoid out of fear that my head would explode > ;-) > > > > > > Thanks, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > <morten...@eluence.com> > > wrote: > > > > > > Well, according to > > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > > wildcard-multiterm-queries-in-solr/ > > > multiterm means > > > > > > wildcard > > > range > > > prefix > > > > > > so it is that way i'm using the word. That same article explains how > > > analysis will be performed with wildcards if the analyzers are > > > multi-term aware. > > > Furthermore, both lucene and dismax do the correct analysis, so I > > > don't think you are right in your statement about the majority of > > > QPs skipping analysis for wildcards. > > > > > > So I'm still confused as to why complexphrase does things differently. > > > > > > Thanks, > > > /Bjarke > > > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > ><emir.arnauto...@sematext.com > > >: > > > > > >> Hi Bjarke, > > >> It is not multiterm that is causing query parser to skip analysis > > >> chain but wildcard. The majority of query parsers do not analyse > > >> query string > > if > > >> there are wildcards. > > >> > > >> HTH > > >> Emir > > >> -- > > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > >> Elasticsearch Consulting Support Training - http://sematext.com/ > > >> > > >> > > >> > > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > > >>> <morten...@eluence.com> > > >> wrote: > > >>> > > >>> Hi list, > > >>> > > >>> I'm trying to search for the term funktionsnedsättning* In my > > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > > >>> So I would expect that funktionsnedsättning* would translate to > > >>> funktionsnedsattning*. > > >>> > > >>> If I use e.g. the lucene query parser, this is indeed what happens: > > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives > > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > > >> funktionsnedsattning*" > > >>> and 15 documents returned. > > >>> > > >>> Trying the same with complexphrase gives me: > > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning > > >>> * > > >> gives me > > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > > >> funktionsnedsättning*" > > >>> and 0 documents. Notice how ä has not been changed to a. > > >>> > > >>> How can this be? Is complexphrase somehow skipping the analysis > > >>> chain > > for > > >>> multiterms, even though components and in particular > > >>> MappingCharFilterFactory are Multi-term aware > > >>> > > >>> Are there any configuration gotchas that I'm not aware of? > > >>> > > >>> Thanks for the help, > > >>> Bjarke Buur Mortensen > > >>> Senior Software Engineer, Eluence A/S > > >> > > >> > > > > >