RE: Complexphrase treats wildcards differently than other query parsers

Allison, Timothy B. Thu, 05 Oct 2017 05:35:03 -0700

There's every chance that I'm missing something at the Solr level, but it 
_looks_ at the Lucene level, like ComplexPhraseQueryParser is still not 
applying analysis to multiterms.


When I call this on 7.0.0:
   QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer);
    return qp.parse(qString);

 where the analyzer is a mock "uppercase vowel" analyzer[1] and the qString is;

"the* quick~" the* quick~ the quick

I get this:
"the* quick~" name:the* name:quick~2 name:thE name:qUIck


[1] 
https://github.com/tballison/lucene-addons/blob/master/lucene-5205/src/test/java/org/apache/lucene/queryparser/spans/TestAdvancedAnalyzers.java#L117

-----Original Message-----
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, October 5, 2017 8:02 AM
To: solr-user@lucene.apache.org
Subject: RE: Complexphrase treats wildcards differently than other query parsers

What version of Solr are you using?

I thought this had been fixed fairly recently, but I can't quickly find the 
JIRA.  Let me take a look.

Best,

             Tim

This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and 
[2], which handles analysis of multiterms even in phrases.

[1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
[2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 

-----Original Message-----
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
Sent: Thursday, October 5, 2017 6:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>:

> Hi Bjarke,
> You are right - I jumped into wrong/old conclusion as the simplest 
> answer to your question.


 No problem :-)

I guess looking at the code could give you an answer.
>

This is what I would like to avoid out of fear that my head would explode
;-)


>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen 
> > <morten...@eluence.com>
> wrote:
> >
> > Well, according to
> > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> wildcard-multiterm-queries-in-solr/
> > multiterm means
> >
> > wildcard
> > range
> > prefix
> >
> > so it is that way i'm using the word. That same article explains how 
> > analysis will be performed with wildcards if the analyzers are 
> > multi-term aware.
> > Furthermore, both lucene and dismax do the correct analysis, so I 
> > don't think you are right in your statement about the majority of 
> > QPs skipping analysis for wildcards.
> >
> > So I'm still confused as to why complexphrase does things differently.
> >
> > Thanks,
> > /Bjarke
> >
> > 2017-10-05 10:16 GMT+02:00 Emir Arnautović 
> ><emir.arnauto...@sematext.com
> >:
> >
> >> Hi Bjarke,
> >> It is not multiterm that is causing query parser to skip analysis 
> >> chain but wildcard. The majority of query parsers do not analyse 
> >> query string
> if
> >> there are wildcards.
> >>
> >> HTH
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> >> Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen 
> >>> <morten...@eluence.com>
> >> wrote:
> >>>
> >>> Hi list,
> >>>
> >>> I'm trying to search for the term funktionsnedsättning* In my 
> >>> analyzer chain I use a MappingCharFilterFactory to change ä to a.
> >>> So I would expect that funktionsnedsättning* would translate to 
> >>> funktionsnedsattning*.
> >>>
> >>> If I use e.g. the lucene query parser, this is indeed what happens:
> >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives 
> >>> me "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsattning*"
> >>> and 15 documents returned.
> >>>
> >>> Trying the same with complexphrase gives me:
> >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning
> >>> *
> >> gives me
> >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsättning*"
> >>> and 0 documents. Notice how ä has not been changed to a.
> >>>
> >>> How can this be? Is complexphrase somehow skipping the analysis 
> >>> chain
> for
> >>> multiterms, even though components and in particular 
> >>> MappingCharFilterFactory are Multi-term aware
> >>>
> >>> Are there any configuration gotchas that I'm not aware of?
> >>>
> >>> Thanks for the help,
> >>> Bjarke Buur Mortensen
> >>> Senior Software Engineer, Eluence A/S
> >>
> >>
>
>

RE: Complexphrase treats wildcards differently than other query parsers

Reply via email to