Hi Chris,
 Thanks for the insight.

 1. "omitTermFreqAndPositions" is very straightforward but if I avoid
positions I'll refuse to serve phrase queries. I had searched for this in
past as well but I finally reached to the conclusion that there is no thing
like "omitTermFreq" (only). Perhaps because frequency is the count of
positions of a term and we can not discard it if latter is present. :( .
Please point me out If I am wrong. And if I really am, that would be
exactly what I need.

 2. Function query seemed nice (though strange because I never used it
before) and I gave it a few hours but that too did not seem to solve my
requirement. The "artificial" score we are generating is getting multiplied
into rest of the score which includes score due to "cat" field as well. (I
can not remove "cat" from "qf" as I have to search there). It is only that
I don't want this field's score on the basis of matching "tf".


 To explain second point here is what I did.
 I indexed 4 documents
doc 1

tile:chair,
cat:chair and chair

doc 2

tile:table,
cat:chair and chair

doc 3

tile:chair,
cat:chair and table

doc 4

tile:table,
cat:chair and table


searching for a simple query
http://localhost:8983/solr/site1/select/?<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
q=*chair*&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=title&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=cat&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
fl=title,cat,id,score&<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
pf=ttile<http://localhost:8983/solr/site1/select/?q=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>

gives 4 results (1,3,2,4)

I want document 1 and 3 with equal score and 2 and 4 with similar score.
because the only difference within the pairs is only "cat" field's value

After spending some hours on function queries I finally reached on
following query
http://localhost:8983/solr/site1/select/?<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
q={!boost%20b=$cat_boost%20v=$main_query}&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
main_query={!dismax%20qf=%22title%20cat%22%20v=$qry}&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
cat_boost={!func}map(query({!field%20f=cat%20v=$qry},-1),0,1000,1,0)&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qry=*chair*&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=title&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
qf=cat&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
fl=title,cat,displayid,score&<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>
pf=ttile<http://localhost:8983/solr/site1/select/?q=%7B!boost%20b=$cat_boost%20v=$main_query%7D&main_query=%7B!dismax%20qf=%22title%20mcatnametext%22%20v=$qry%7D&cat_boost=%7B!func%7Dmap(query(%7B!field%20f=mcatnametext%20v=$qry%7D,-1),0,1000,1,0)&qry=chair&qf=title&qf=mcatnametext&fl=title,mcatnametext,displayid,score&pf=ttile&debugQuery=true&echoParams=all>


But debugging the query showed that the boost value ($cat_boost) is being
multiplied into a value which is generated with the help of "cat" field
thus resulting in different scores for 1 and 3 (similarly for 2 and 4).

1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01
(),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of:
  1.2942866 = (MATCH) sum of:
    1.2942866 = (MATCH) max plus 0.01 times others of:
      1.2876587 = (MATCH) weight(title:chair in 0), product of:
        0.9999818 = queryWeight(title:chair), product of:
          1.287682 = idf(docFreq=2, maxDocs=4)
          0.7765751 = queryNorm
        1.287682 = (MATCH) fieldWeight(title:chair in 0), product of:
          1.0 = tf(termFreq(title:chair)=1)
          1.287682 = idf(docFreq=2, maxDocs=4)
          1.0 = fieldNorm(field=title, doc=0)
      0.66279614 = (MATCH) weight(cat:chair in 0), product of:
        0.60328734 = queryWeight(cat:chair), product of:
          0.7768564 = idf(docFreq=4, maxDocs=4)
          0.7765751 = queryNorm
        1.0986409 = (MATCH) fieldWeight(cat:chair in 0), product of:
          1.4142135 = tf(termFreq(cat:chair)=2)
          0.7768564 = idf(docFreq=4, maxDocs=4)
          1.0 = fieldNorm(field=cat, doc=0)
 * 1.0* =
map(query(cat:chair,def=-1.0)=1.0986409,min=0.0,max=1000.0,target=1.0)




Did I get you wrong?
I'll appreciate if you could point out any mistake (or my
misinterpretation) in the mail above.


I was thinking there should be some hook or plugin (or anything) which
could just change the score calculation formula *for a particular field*.
There is a function in DefaultSimilarity class - *public float tf(float
freq)* but that does not mention the field name. Is there a possibility to
look into this direction?


Thank you very much.




On Tue, Nov 8, 2011 at 6:23 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote:

>
> : You can write your custom similarity implementation, and override the
> : /lengthNorm()/ method to return a constant value.
>
> The postered already said (twice!) that they have already set
> omitNorms=true, so lengthNorm won't even be used
>
> omiting norms (or mucking with norms by modifying the lengthNorm function)
> only affects the norms portion of the scoring -- the problem being
> described here is when a document matches the input term more then once:
> that is an issue of the "term freuency".
>
> Setting omitTermFreqAndPositions="true" on your field type will eliminate
> the term frequency from the equation, and it will become a simple "match
> or not" factor in your scoring.
>
> From the "more then one way to do it" standpoint, another option is to
> wrap the query in a function that flattens the scores (more fine grained
> control, and doesn't require re-indexing, but probably less efficient)
>
> q={!boost b=$cat_boost v=$main_query}
> main_query=...
> cat_boost={!func}map(map(query({!field f=cat v=$cat},-1),0,10000,5)-1,-1,1)
> cat=...
>
> (note: used nested maps so that non-matches would result in a 1x
> multipler, while matches result in a 5x multiplier)
>
> -Hoss
>



-- 
Regards,
Samar

Reply via email to