Re the relevancy changes I note below for edismax, there are already some issues filed:
pertaining to the difference in how the phrase queries are merged into the main query: See Michael Dodsworth's comment of 25/Sep/12 on this issue: https://issues.apache.org/jira/browse/SOLR-2058 <-- ticket is closed, but this issue is not addressed. and pertaining to skipping terms in phrase boosting when part of the query is a phrase: https://issues.apache.org/jira/browse/SOLR-4130 - Naomi On Sep 3, 2013, at 5:54 PM, Naomi Dushay wrote: > When I have a field using CJKBigramFilter, parsed CJK chars have a different > parsedQuery than non-CJK queries. > > (旧小说 is 3 chars, so 2 bigrams) > > args sent in: q={!qf=bi_fld}旧小说&pf=&pf2=&pf3= > > debugQuery > <str name="rawquerystring">{!qf=bi_fld}旧小说</str> > <str name="querystring">{!qf=bi_fld}旧小说</str> > <str name="parsedquery">(+DisjunctionMaxQuery((((bi_fld:旧小 > bi_fld:小说)~2))~0.01) ())/no_coord</str> > <str name="parsedquery_toString">+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()</str> > > > If i use a non-CJK query string, with the same field: > > args sent in: q={!qf=bi_fld}foo bar&pf=&pf2=&pf3= > > debugQuery: > <str name="rawquerystring">{!qf=bi_fld}foo bar</str> > <str name="querystring">{!qf=bi_fld}foo bar</str> > <str name="parsedquery">(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) > DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord</str> > <str name="parsedquery_toString">+(((bi_fld:foo)~0.01 > (bi_fld:bar)~0.01)~2)</str> > > > Why are the parsedquery_toString formula different? And is there any > difference in the actual relevancy formula? > > How can you tell the difference between the MinNrShouldMatch and a qs or ps > or tie value, if they are all represented as ~n in the parsedQuery string? > > > To try to get a handle on qs, ps, tie and mm: > > args: q={!qf=bi_fld pf=bi_fld}"a b" c d&qs=5&ps=4 > > debugQuery: > <str name="rawquerystring">{!qf=bi_fld pf=bi_fld}"a b" c d</str> > <str name="querystring">{!qf=bi_fld pf=bi_fld}"a b" c d</str> > <str name="parsedquery">(+((DisjunctionMaxQuery((bi_fld:"a b"~5)~0.01) > DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) > DisjunctionMaxQuery((bi_fld:"c d"~4)~0.01))/no_coord</str> > <str name="parsedquery_toString">+(((bi_fld:"a b"~5)~0.01 (bi_fld:c)~0.01 > (bi_fld:d)~0.01)~3) (bi_fld:"c d"~4)~0.01</str> > > > I get that qs, the query slop, is for explicit phrases in the query, so "a > b"~5 makes sense. I also get that ps is for boosting of phrases, so I > get (bi_fld:"c d"~4) … but where is (cjk_uni_pub_search:"a b c d"~4) ? > > > Using dismax (instead of edismax): > > args: q={!dismax qf=bi_fld pf=bi_fld}"a b" c d&qs=5&ps=4 > > debugQuery: > <str name="rawquerystring">{!dismax qf=bi_fld pf=bi_fld}"a b" c d</str> > <str name="querystring">{!dismax qf=bi_fld pf=bi_fld}"a b" c d</str> > <str name="parsedquery">(+((DisjunctionMaxQuery((bi_fld:"a b"~5)~0.01) > DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) > DisjunctionMaxQuery((bi_fld:"a b c d"~4)~0.01))/no_coord</str> > <str name="parsedquery_toString">+(((bi_fld:"a b"~5)~0.01 (bi_fld:c)~0.01 > (bi_fld:d)~0.01)~3) (bi_fld:"a b c d"~4)~0.01</str> > > > So is this an edismax bug? > > > > FYI, I am running Solr 4.4. I have fields defined like so: > <fieldtype name="text_cjk_bi" class="solr.TextField" > positionIncrementGap="10000" autoGeneratePhraseQueries="false"> > <analyzer> > <tokenizer class="solr.ICUTokenizerFactory" /> > <filter class="solr.CJKWidthFilterFactory"/> > <filter class="solr.ICUTransformFilterFactory" > id="Traditional-Simplified"/> > <filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/> > <filter class="solr.ICUFoldingFilterFactory"/> > <filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true" > katakana="true" hangul="true" outputUnigrams="false" /> > </analyzer> > </fieldtype> > > The request handler uses edismax: > > <requestHandler name="search" class="solr.SearchHandler" default="true"> > <lst name="defaults"> > <str name="defType">edismax</str> > <str name="q.alt">:</str> > <str name="mm">6<-1 6<90%</str> > <int name="qs">1</int> > <int name="ps">0</int> >