Re: Complexphrase treats wildcards differently than other query parsers

Bjarke Buur Mortensen Thu, 05 Oct 2017 05:51:56 -0700

Thanks Tim,
that might be what I'm experiencing. I'm actually quite certain of it :-)


Do you remember any reason that multi term analysis is not happening in
ComplexPhraseQueryParser?

I'm on 6.6.1, so latest on the 6.x branch.

2017-10-05 14:34 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>:

> There's every chance that I'm missing something at the Solr level, but it
> _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not
> applying analysis to multiterms.
>
> When I call this on 7.0.0:
>    QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName,
> analyzer);
>     return qp.parse(qString);
>
>  where the analyzer is a mock "uppercase vowel" analyzer[1] and the
> qString is;
>
> "the* quick~" the* quick~ the quick
>
> I get this:
> "the* quick~" name:the* name:quick~2 name:thE name:qUIck
>
>
> [1] https://github.com/tballison/lucene-addons/blob/master/
> lucene-5205/src/test/java/org/apache/lucene/queryparser/
> spans/TestAdvancedAnalyzers.java#L117
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Thursday, October 5, 2017 8:02 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Complexphrase treats wildcards differently than other query
> parsers
>
> What version of Solr are you using?
>
> I thought this had been fixed fairly recently, but I can't quickly find
> the JIRA.  Let me take a look.
>
> Best,
>
>              Tim
>
> This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1]
> and [2], which handles analysis of multiterms even in phrases.
>
> [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
> [2] https://mvnrepository.com/artifact/org.tallison.lucene/
> lucene-5205/6.6-0.1
>
> -----Original Message-----
> From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
> Sent: Thursday, October 5, 2017 6:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other query
> parsers
>
> 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>:
>
> > Hi Bjarke,
> > You are right - I jumped into wrong/old conclusion as the simplest
> > answer to your question.
>
>
>  No problem :-)
>
> I guess looking at the code could give you an answer.
> >
>
> This is what I would like to avoid out of fear that my head would explode
> ;-)
>
>
> >
> > Thanks,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> > Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen
> > > <morten...@eluence.com>
> > wrote:
> > >
> > > Well, according to
> > > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> > wildcard-multiterm-queries-in-solr/
> > > multiterm means
> > >
> > > wildcard
> > > range
> > > prefix
> > >
> > > so it is that way i'm using the word. That same article explains how
> > > analysis will be performed with wildcards if the analyzers are
> > > multi-term aware.
> > > Furthermore, both lucene and dismax do the correct analysis, so I
> > > don't think you are right in your statement about the majority of
> > > QPs skipping analysis for wildcards.
> > >
> > > So I'm still confused as to why complexphrase does things differently.
> > >
> > > Thanks,
> > > /Bjarke
> > >
> > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović
> > ><emir.arnauto...@sematext.com
> > >:
> > >
> > >> Hi Bjarke,
> > >> It is not multiterm that is causing query parser to skip analysis
> > >> chain but wildcard. The majority of query parsers do not analyse
> > >> query string
> > if
> > >> there are wildcards.
> > >>
> > >> HTH
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> > >> Elasticsearch Consulting Support Training - http://sematext.com/
> > >>
> > >>
> > >>
> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen
> > >>> <morten...@eluence.com>
> > >> wrote:
> > >>>
> > >>> Hi list,
> > >>>
> > >>> I'm trying to search for the term funktionsnedsättning* In my
> > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a.
> > >>> So I would expect that funktionsnedsättning* would translate to
> > >>> funktionsnedsattning*.
> > >>>
> > >>> If I use e.g. the lucene query parser, this is indeed what happens:
> > >>> ...debugQuery=on&defType=lucene&q=funktionsneds%C3%A4ttning* gives
> > >>> me "rawquerystring":"funktionsnedsättning*", "querystring":
> > >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> > >> funktionsnedsattning*"
> > >>> and 15 documents returned.
> > >>>
> > >>> Trying the same with complexphrase gives me:
> > >>> ...debugQuery=on&defType=complexphrase&q=funktionsneds%C3%A4ttning
> > >>> *
> > >> gives me
> > >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> > >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> > >> funktionsnedsättning*"
> > >>> and 0 documents. Notice how ä has not been changed to a.
> > >>>
> > >>> How can this be? Is complexphrase somehow skipping the analysis
> > >>> chain
> > for
> > >>> multiterms, even though components and in particular
> > >>> MappingCharFilterFactory are Multi-term aware
> > >>>
> > >>> Are there any configuration gotchas that I'm not aware of?
> > >>>
> > >>> Thanks for the help,
> > >>> Bjarke Buur Mortensen
> > >>> Senior Software Engineer, Eluence A/S
> > >>
> > >>
> >
> >
>

Re: Complexphrase treats wildcards differently than other query parsers

Reply via email to