Re: Complexphrase treats wildcards differently than other query parsers

Bjarke Buur Mortensen Mon, 09 Oct 2017 05:39:09 -0700

Thanks again, Tim,
following your recipe, I was able to write a failing test:


    assertQ(req("q", "{!complexphrase} iso-latin1:cr\u00E6zy*")
    , "//result[@numFound='1']"
    , "//doc[./str[@name='id']='1']"
    );

Notice how cr\u00E6zy* is used as a query term which mimics the behaviour I
originally reported, namely that CPQP does not analyse it because of the
wildcard and thus does not hit the charfilter from the query side.


2017-10-06 20:54 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>:

> That could be it.  I'm not able to reproduce this with trunk.  More next
> week.
>
> In trunk, if I add this to schema15.xml:
>   <fieldType name="text_iso_latin1_mapping" class="solr.TextField">
>     <analyzer>
>       <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-
> ISOLatin1Accent.txt"/>
>       <tokenizer class="solr.MockTokenizerFactory"/>
>     </analyzer>
>   </fieldType>
>   <field name="iso-latin1" type="text_iso_latin1_mapping" indexed="true"
> stored="true"/>
>
> This test passes.
>
>   @Test
>   public void testCharFilter() {
>     assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1"));
>     assertU(commit());
>     assertU(optimize());
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:craezy")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:traen")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:crae*")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:*aezy")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:crae*y")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>   }
>
>
>
> -----Original Message-----
> From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
> Sent: Friday, October 6, 2017 6:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other query
> parsers
>
> Thanks a lot for your effort, Tim.
>
> Looking at it from the Solr side, I see some use of local classes. The
> snippet below in particular caught my eye (in solr/core/src/java/org/apache/
> solr/search/ComplexPhraseQParserPlugin.java).
> The instance of ComplexPhraseQueryParser is not the clean one from Lucene,
> but a modified one. If any of the modifications messes with the analysis
> logic, well then that might answer it.
>
> What do you make of it?
>
> lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
> getQueryAnalyzer())
> {
> protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try {
> org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
> getWildcardQuery(t.field(), t.text());
> setRewriteMethod(wildcardQuery);
> return wildcardQuery;
> } catch (SyntaxError e) {
> throw new RuntimeException(e);
> }
> }
> private Query setRewriteMethod(org.apache.lucene.search.Query query) { if
> (query instanceof MultiTermQuery) {
> ((MultiTermQuery) query).setRewriteMethod( org.apache.lucene.search.
> MultiTermQuery.SCORING_BOOLEAN_REWRITE);
> }
> return query;
> }
> protected Query newRangeQuery(String field, String part1, String part2,
> boolean startInclusive, boolean endInclusive) { boolean reverse =
> reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
> part1);
> return super.newRangeQuery(field,
> reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2,
> startInclusive || reverse, endInclusive); } } ;
>
> Thanks,
> Bjarke
>
>
>

Re: Complexphrase treats wildcards differently than other query parsers

Reply via email to