Re: Join Scoring

David Smiley (@MITRE.org) Tue, 11 Feb 2014 21:03:07 -0800

Hi Anand.

Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
more memory efficient (particularly the worse-case memory use) to implement
the JOIN query without scoring, so that's why.  Of course, you might want it
to score and pay whatever penalty is involved.  For that you'll need to
write a Solr "QueryParser" that might use Lucene's "join" module which has
scoring variants.  I've taken this approach before.  You asked a specific
question about the purpose of JoinScorer when it doesn't actually score. 
Lucene's "Query" produces a "Weight" which in turn produces a "Scorer" that
is a DocIdSetIterator plus it returns a score.  So Queries have to have a
Scorer to match any document even if the score is always 1.

Solr does indeed have a lot of caching; that may be in play here when
comparing against a quick attempt at using Lucene directly.  In particular,
the matching documents are likely to end up in Solr's DocumentCache. 
Returning stored fields that come back in search results are one of the more
expensive things Lucene/Solr does.

I also think you noted that the fields on documents from the "from" side of
the query are not available to be returned in search results, just the "to"
side.  Yup; that's true.  To remedy this, you might write a Solr
SearchComponent that adds fields from the "from" side.  That could be tricky
to do; it would probably need to re-run the from-side query but filtered to
the matching top-N documents being returned.

~ David

anand chandak wrote
> Resending, if somebody can please respond.
> 
> 
> Thanks,
> 
> Anand
> 
> 
> On 2/5/2014 6:26 PM, anand chandak wrote:
> Hi,
> 
> Having a question on join score, why doesn't the solr join query return 
> the scores. Looking at the code, I see there's JoinScorer defined in 
> the  JoinQParserPlugin class ? If its not used for scoring ? where is it 
> actually used.
> 
> Also, to evaluate the performance of solr join plugin vs lucene 
> joinutil, I filed same join query against same data-set and same schema 
> and in the results, I am always seeing the Qtime for Solr much lower 
> then lucenes. What is the reason behind this ?  Solr doesn't return 
> scores could that cause so much difference ?
> 
> My guess is solr has very sophisticated caching mechanism and that might 
> be coming in play, is that true ? or there's difference in the way JOIN 
> happens in the 2 approach.
> 
> If I understand correctly both the implementation are using 2 pass 
> approach - first all the terms from fromField and then returns all 
> documents that have matching terms in a toField
> 
> If somebody can throw some light, would highly appreciate.
> 
> Thanks,
> Anand

-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Join Scoring

Reply via email to