How to obtain the Explained output programmatically ?

2011-10-03 Thread David Ryan
Hi,

I need to use some detailed information of the explained result in Solr
search.

Here is one example:

*
http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=on&debugQuery=true&fl=id,score
*


0.18314168 = (MATCH) sum of:
  0.18314168 = (MATCH) weight(text:gb in 1), product of:
0.35845062 = queryWeight(text:gb), product of:
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15502669 = queryNorm
0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
  1.4142135 = tf(termFreq(text:gb)=2)
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15625 = fieldNorm(field=text, doc=1)


I could see the explained result by clicking the "toggle explain" button in
the web browser.   Is there a way to access the explained output
programmatically?


Regards,
David


Re: How to obtain the Explained output programmatically ?

2011-10-03 Thread David Ryan
Thanks Hoss!

debug.explain.structured
is
definitely helpful.  It adds some structure to the plain explained output.
Is there a way to access these structured outputs in Java code (e.g., via
Solr plugin class)?
We could write a HTML parse to examine the output in the browser, but it's
probably no the best way to do that.



On Mon, Oct 3, 2011 at 2:11 PM, Chris Hostetter wrote:

>
> :
> http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=on&debugQuery=true&fl=id,score
>...
> : the web browser.   Is there a way to access the explained output
> : programmatically?
>
> https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured
>
>
> -Hoss
>


Scoring of DisMax in Solr

2011-10-04 Thread David Ryan
Hi,


When I examine the score calculation of DisMax in Solr,   it looks to me
that DisMax is using  tf x idf^2 instead of tf x idf.
Does anyone have insight why tf x idf is not used here?

Here is the score contribution from one one field:

score(q,c) =  queryWeight x fieldWeight
   = tf x idf x idf x queryNorm x fieldNorm

Here is the example that I used to derive the formula above. Clearly, idf is
multiplied twice in the score calculation.
*
http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=on&debugQuery=true&fl=id,score
*


0.18314168 = (MATCH) sum of:
  0.18314168 = (MATCH) weight(text:gb in 1), product of:
0.35845062 = queryWeight(text:gb), product of:
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15502669 = queryNorm
0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
  1.4142135 = tf(termFreq(text:gb)=2)
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15625 = fieldNorm(field=text, doc=1)



Thanks!


Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
Thanks! What's the procedure to report this if it's a bug?
EDisMax has similar behavior.

On Tue, Oct 4, 2011 at 11:24 PM, Bill Bell  wrote:

> This seems like a bug to me.
>
> On 10/4/11 6:52 PM, "David Ryan"  wrote:
>
> >Hi,
> >
> >
> >When I examine the score calculation of DisMax in Solr,   it looks to me
> >that DisMax is using  tf x idf^2 instead of tf x idf.
> >Does anyone have insight why tf x idf is not used here?
> >
> >Here is the score contribution from one one field:
> >
> >score(q,c) =  queryWeight x fieldWeight
> >   = tf x idf x idf x queryNorm x fieldNorm
> >
> >Here is the example that I used to derive the formula above. Clearly, idf
> >is
> >multiplied twice in the score calculation.
> >*
> >
> http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent
> >=on&debugQuery=true&fl=id,score
> >*
> >
> >
> >0.18314168 = (MATCH) sum of:
> >  0.18314168 = (MATCH) weight(text:gb in 1), product of:
> >0.35845062 = queryWeight(text:gb), product of:
> >  2.3121865 = idf(docFreq=6, numDocs=26)
> >  0.15502669 = queryNorm
> >0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
> >  1.4142135 = tf(termFreq(text:gb)=2)
> >  2.3121865 = idf(docFreq=6, numDocs=26)
> >  0.15625 = fieldNorm(field=text, doc=1)
> >
> >
> >
> >Thanks!
>
>
>


Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
Hi Markus,

The idf calculation itself is correct.
What I am trying to understand here is  why idf value is multiplied twice in
the final score calculation. Essentially,  tf x idf^2 is used instead of tf
x idf.
I'd like to understand the rational behind that.





On Wed, Oct 5, 2011 at 9:43 AM, Markus Jelsma wrote:

> In Lucene's default similarity idf = 1 + ln (numDocs / df + 1).
> 1 + ln(26 / 7) =~ 2.3121865
>
> I don't see a problem.
>
> > Hi,
> >
> >
> > When I examine the score calculation of DisMax in Solr,   it looks to me
> > that DisMax is using  tf x idf^2 instead of tf x idf.
> > Does anyone have insight why tf x idf is not used here?
> >
> > Here is the score contribution from one one field:
> >
> > score(q,c) =  queryWeight x fieldWeight
> >= tf x idf x idf x queryNorm x fieldNorm
> >
> > Here is the example that I used to derive the formula above. Clearly, idf
> > is multiplied twice in the score calculation.
> > *
> >
> http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent=
> > on&debugQuery=true&fl=id,score *
> >
> > 
> > 0.18314168 = (MATCH) sum of:
> >   0.18314168 = (MATCH) weight(text:gb in 1), product of:
> > 0.35845062 = queryWeight(text:gb), product of:
> >   2.3121865 = idf(docFreq=6, numDocs=26)
> >   0.15502669 = queryNorm
> > 0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
> >   1.4142135 = tf(termFreq(text:gb)=2)
> >   2.3121865 = idf(docFreq=6, numDocs=26)
> >   0.15625 = fieldNorm(field=text, doc=1)
> > 
> >
> >
> > Thanks!
>


Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
Ok, here is the calculation of the score:

0.18314168  =  *2.3121865* * 0.15502669 * 1.4142135 * *2.3121865* * 0.15625

*2.3121865 is *multiplied twice here.  That is what I mean tf x idf^2 is
used instead of tf x idf.



On Wed, Oct 5, 2011 at 10:42 AM, Markus Jelsma
wrote:

> Hi,
>
> I don't see 2.3121865 * 2 anywhere in your debug output or something that
> looks like that.
>
>
> > Hi Markus,
> >
> > The idf calculation itself is correct.
> > What I am trying to understand here is  why idf value is multiplied twice
> > in the final score calculation. Essentially,  tf x idf^2 is used instead
> > of tf x idf.
> > I'd like to understand the rational behind that.
> >
> > On Wed, Oct 5, 2011 at 9:43 AM, Markus Jelsma
> wrote:
> > > In Lucene's default similarity idf = 1 + ln (numDocs / df + 1).
> > > 1 + ln(26 / 7) =~ 2.3121865
> > >
> > > I don't see a problem.
> > >
> > > > Hi,
> > > >
> > > >
> > > > When I examine the score calculation of DisMax in Solr,   it looks to
> > > > me that DisMax is using  tf x idf^2 instead of tf x idf.
> > > > Does anyone have insight why tf x idf is not used here?
> > > >
> > > > Here is the score contribution from one one field:
> > > >
> > > > score(q,c) =  queryWeight x fieldWeight
> > > >
> > > >= tf x idf x idf x queryNorm x fieldNorm
> > > >
> > > > Here is the example that I used to derive the formula above. Clearly,
> > > > idf is multiplied twice in the score calculation.
> > > > *
> > >
> > >
> http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&inden
> > > t=
> > >
> > > > on&debugQuery=true&fl=id,score *
> > > >
> > > > 
> > > >
> > > > 0.18314168 = (MATCH) sum of:
> > > >   0.18314168 = (MATCH) weight(text:gb in 1), product of:
> > > > 0.35845062 = queryWeight(text:gb), product of:
> > > >   2.3121865 = idf(docFreq=6, numDocs=26)
> > > >   0.15502669 = queryNorm
> > > >
> > > > 0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
> > > >   1.4142135 = tf(termFreq(text:gb)=2)
> > > >   2.3121865 = idf(docFreq=6, numDocs=26)
> > > >   0.15625 = fieldNorm(field=text, doc=1)
> > > >
> > > > 
> > > >
> > > >
> > > > Thanks!
>


New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread David Ryan
Hi,

According to the IRA issue 2959,
https://issues.apache.org/jira/browse/LUCENE-2959

BM25 will be included in the next release of LUCENE.

1). Will BM25F be included in the next release as well as part
of LUCENE-2959?
2).  What's the timeline of the next release that new scoring modules will
be available?


Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
The example does not include the evidence.  But we do use eDisMax for
scoring in Solr.

The following is from solrconfig.xml:

edismax


Here is a short snippet of the explained result, where 0.1 is the Tie
breaker in DisMax/eDisMax.

6.446447 = (MATCH) max plus 0.1 times others of:

0.63826215 = (MATCH) weight(description:sony^0.25 in 802), product of:

   .


I noticed that in DefaultSimilarity,  tf x idf^2 is used instead of tf x
idf, as stated in your link.


I am wondering if anyone has insight why that DisMax/eDisMax adopts the same
approach using tf x idf^2

I will try java-user@lucene mailing list as well.




On Wed, Oct 5, 2011 at 11:30 AM, Chris Hostetter
wrote:

>
> : Thanks! What's the procedure to report this if it's a bug?
> : EDisMax has similar behavior.
>
> what yo uare seeing isn't specific to dismax & edismax (in fact: there's
> no evidence in your example that dismax is even being used)
>
> what you are seeing is the basic scoring of a TermQuery using the
> DefaultSimilarity in lucene...
>
>
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/Similarity.html
>
> ...if you have specific questions about how/why this scoring forumala is
> used, i would suggest posting them to the java-user@lucene mailing list.
>
>
> -Hoss
>


Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread David Ryan
Do you mean both BM25 and BM25F?


On Wed, Oct 5, 2011 at 11:44 AM, Robert Muir  wrote:

> On Wed, Oct 5, 2011 at 2:23 PM, David Ryan  wrote:
> > Hi,
> >
> > According to the IRA issue 2959,
> > https://issues.apache.org/jira/browse/LUCENE-2959
> >
> > BM25 will be included in the next release of LUCENE.
> >
> > 1). Will BM25F be included in the next release as well as part
> > of LUCENE-2959?
>
> should be in the 4.0 release, as the solr integration has already been
> added (https://issues.apache.org/jira/browse/SOLR-2754)
> So if you checkout trunk from svn, you can specify these factories in
> your schema.xml to use these new models.
>
> > 2).  What's the timeline of the next release that new scoring modules
> will
> > be available?
> >
>
> unfortunately, nobody can answer this...
>
> --
> lucidimagination.com
>