There's every chance that I'm missing something at the Solr level, but it
_looks_ at the Lucene level, like ComplexPhraseQueryParser is still not
applying analysis to multiterms.
When I call this on 7.0.0:
QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer);
return qp.parse(qString);
where the analyzer is a mock "uppercase vowel" analyzer[1] and the qString is;
"the* quick~" the* quick~ the quick
I get this:
"the* quick~" name:the* name:quick~2 name:thE name:qUIck
[1]
https://github.com/tballison/lucene-addons/blob/master/lucene-5205/src/test/java/org/apache/lucene/queryparser/spans/TestAdvancedAnalyzers.java#L117
-----Original Message-----
From: Allison, Timothy B. [mailto:[email protected]]
Sent: Thursday, October 5, 2017 8:02 AM
To: [email protected]
Subject: RE: Complexphrase treats wildcards differently than other query parsers
What version of Solr are you using?
I thought this had been fixed fairly recently, but I can't quickly find the
JIRA. Let me take a look.
Best,
Tim
This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and
[2], which handles analysis of multiterms even in phrases.
[1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
[2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1
-----Original Message-----
From: Bjarke Buur Mortensen [mailto:[email protected]]
Sent: Thursday, October 5, 2017 6:28 AM
To: [email protected]
Subject: Re: Complexphrase treats wildcards differently than other query parsers
2017-10-05 11:29 GMT+02:00 Emir Arnautović <[email protected]>:
> Hi Bjarke,
> You are right - I jumped into wrong/old conclusion as the simplest
> answer to your question.
No problem :-)
I guess looking at the code could give you an answer.
>
This is what I would like to avoid out of fear that my head would explode
;-)
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen
> > <[email protected]>
> wrote:
> >
> > Well, according to
> > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> wildcard-multiterm-queries-in-solr/
> > multiterm means
> >
> > wildcard
> > range
> > prefix
> >
> > so it is that way i'm using the word. That same article explains how
> > analysis will be performed with wildcards if the analyzers are
> > multi-term aware.
> > Furthermore, both lucene and dismax do the correct analysis, so I
> > don't think you are right in your statement about the majority of
> > QPs skipping analysis for wildcards.
> >
> > So I'm still confused as to why complexphrase does things differently.
> >
> > Thanks,
> > /Bjarke
> >
> > 2017-10-05 10:16 GMT+02:00 Emir Arnautović
> ><[email protected]
> >:
> >
> >> Hi Bjarke,
> >> It is not multiterm that is causing query parser to skip analysis
> >> chain but wildcard. The majority of query parsers do not analyse
> >> query string
> if
> >> there are wildcards.
> >>
> >> HTH
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> >> Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen
> >>> <[email protected]>
> >> wrote:
> >>>
> >>> Hi list,
> >>>
> >>> I'm trying to search for the term funktionsnedsättning* In my
> >>> analyzer chain I use a MappingCharFilterFactory to change ä to a.
> >>> So I would expect that funktionsnedsättning* would translate to
> >>> funktionsnedsattning*.
> >>>
> >>> If I use e.g. the lucene query parser, this is indeed what happens:
> >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives
> >>> me "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsattning*"
> >>> and 15 documents returned.
> >>>
> >>> Trying the same with complexphrase gives me:
> >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning
> >>> *
> >> gives me
> >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsättning*"
> >>> and 0 documents. Notice how ä has not been changed to a.
> >>>
> >>> How can this be? Is complexphrase somehow skipping the analysis
> >>> chain
> for
> >>> multiterms, even though components and in particular
> >>> MappingCharFilterFactory are Multi-term aware
> >>>
> >>> Are there any configuration gotchas that I'm not aware of?
> >>>
> >>> Thanks for the help,
> >>> Bjarke Buur Mortensen
> >>> Senior Software Engineer, Eluence A/S
> >>
> >>
>
>