Re: Any filter to map mutiple tokens into one ?

T. Kuro Kurosaka Mon, 15 Oct 2012 11:33:27 -0700

On 10/15/12 10:35 AM, Jack Krupansky wrote:

And you're absolutely certain you see "*:*" being passed to youranalyzer in the final release of Solr 4.0???

I don't have a direct evidence. This is the only theory I have thatexplains why changing FieldType causes the sub-optimal scores.

If you know of a way to tell if a tokenizer is really invoked, let me know.


-- Jack Krupansky

-----Original Message----- From: T. Kuro Kurosaka
Sent: Monday, October 15, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

On 10/14/12 12:19 PM, Jack Krupansky wrote:

There's a miscommunication here somewhere. Is Solr 4.0 still passing"*:*" to the analyzer? Show us the parsed query for "*:*", as well asthe debugQuery "explain" for the score.

I'm not quite sure what you mean by the parsed query for "*:*".
This fake analyzer using NGramTokenizer divides "*:*" into three tokens
"*", ":", and "*", on purpose to simulate our Tokenizer's behavior.

An excerpt of he XML results from the query is pasted in the bottom of
this message.

I mean, "*:*" (MatchAllDocsQuery) has a "constant score", so thereisn't any way for it to be "suboptimal".

That's exactly the point I'd like to raise.
No matter what analyzers are assigned to fields, the hit score for "*:*"
must remain 1.0, but it's not happening when an analyzer that divides
"*:*" are in use.


Here's an excerpt of the query response. Notice this element, which
should not be there, in my opinion:
DisjunctionMaxQuery((name:"* : *"^0.5))
There is a space between * and :, and another space between : and *.

<response>
<lstname="responseHeader">
<intname="status">0</int>
<intname="QTime">33</int>
<lstname="params">
<strname="indent">on</str>
<strname="wt"/>
<strname="version">2.2</str>
<strname="rows">10</str>
<strname="defType">edismax</str>
<strname="pf">name^0.5</str>
<strname="fl">*,score</str>
<strname="debugQuery">on</str>
<strname="start">0</str>
<strname="q">*:*</str>
<strname="qt"/>
<strname="fq"/>
</lst>
</lst>
<resultname="response"numFound="32"start="0"maxScore="0.14764866">
<doc>
<strname="id">GB18030TEST</str>
<strname="name">Test with some GB18030 encoded characters</str>
<arrname="features">
<str>No accents here</str>
<str>这是一个功能</str>
<str>This is a feature (translated)</str>
<str>这份文件是很有光泽</str>
<str>This document is very shiny (translated)</str>
</arr>
<floatname="price">0.0</float>
<strname="price_c">0,USD</str>
<boolname="inStock">true</bool>
<longname="_version_">1415830106215022592</long>
<floatname="score">0.14764866</float>
</doc>
...
</result>
<lstname="debug">
<strname="rawquerystring">*:*</str>
<strname="querystring">*:*</str>
<strname="parsedquery">

(+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* :*"^0.5)))/no_coord

</str>
<strname="parsedquery_toString">+*:* (name:"* : *"^0.5)</str>
<lstname="explain">
<strname="GB18030TEST">
0.14764866 = (MATCH) sum of: 0.14764866 = (MATCH) MatchAllDocsQuery,
product of: 0.14764866 = queryNorm
</str>
</lst>
<strname="QParser">ExtendedDismaxQParser</str>
<nullname="altquerystring"/>
<nullname="boostfuncs"/>
...

</lst>
</lst>
</lst>
</response>

Re: Any filter to map mutiple tokens into one ?

Reply via email to