Re: Join Scoring

Michael McCandless Thu, 13 Feb 2014 03:39:22 -0800

I suspect (not certain) one reason for the performance difference with
Solr vs Lucene joins is that Solr operates on a top-level reader?


This results in fast joins, but it means whenever you open a new
reader (NRT reader) there is a high cost to regenerate the top-level
data structures.

But if the app doesn't open NRT readers, or opens them rarely, perhaps
that cost is a good tradeoff to get faster joins.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 13, 2014 at 12:10 AM, anand chandak
<anand.chan...@oracle.com> wrote:
> Re-posting...
>
>
>
> Thanks,
>
> Anand
>
>
>
> On 2/12/2014 10:55 AM, anand chandak wrote:
>>
>> Thanks David, really helpful response.
>>
>> You mentioned that if we have to add scoring support in solr then a
>> possible approach would be to add a custom QueryParser, which might be
>> taking Lucene's JOIN module.  I have tired this approach and this makes it
>> slow, because I believe this is making more searches..
>>
>> Curious, if it is possible instead to enhance existing solr's
>> JoinQParserPlugin and add the the scoring support in the same class ? Do you
>> think its feasible and recommended ? If yes, what would it take (highlevel)
>> - in terms of code changes, any pointers ?
>>
>>
>> Thanks,
>>
>> Anand
>>
>>
>> On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:
>>>
>>> Hi Anand.
>>>
>>> Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
>>> more memory efficient (particularly the worse-case memory use) to
>>> implement
>>> the JOIN query without scoring, so that's why.  Of course, you might want
>>> it
>>> to score and pay whatever penalty is involved.  For that you'll need to
>>> write a Solr "QueryParser" that might use Lucene's "join" module which
>>> has
>>> scoring variants.  I've taken this approach before.  You asked a specific
>>> question about the purpose of JoinScorer when it doesn't actually score.
>>> Lucene's "Query" produces a "Weight" which in turn produces a "Scorer"
>>> that
>>> is a DocIdSetIterator plus it returns a score.  So Queries have to have a
>>> Scorer to match any document even if the score is always 1.
>>>
>>> Solr does indeed have a lot of caching; that may be in play here when
>>> comparing against a quick attempt at using Lucene directly.  In
>>> particular,
>>> the matching documents are likely to end up in Solr's DocumentCache.
>>> Returning stored fields that come back in search results are one of the
>>> more
>>> expensive things Lucene/Solr does.
>>>
>>> I also think you noted that the fields on documents from the "from" side
>>> of
>>> the query are not available to be returned in search results, just the
>>> "to"
>>> side.  Yup; that's true.  To remedy this, you might write a Solr
>>> SearchComponent that adds fields from the "from" side.  That could be
>>> tricky
>>> to do; it would probably need to re-run the from-side query but filtered
>>> to
>>> the matching top-N documents being returned.
>>>
>>> ~ David
>>>
>>>
>>> anand chandak wrote
>>>>
>>>> Resending, if somebody can please respond.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Anand
>>>>
>>>>
>>>> On 2/5/2014 6:26 PM, anand chandak wrote:
>>>> Hi,
>>>>
>>>> Having a question on join score, why doesn't the solr join query return
>>>> the scores. Looking at the code, I see there's JoinScorer defined in
>>>> the  JoinQParserPlugin class ? If its not used for scoring ? where is it
>>>> actually used.
>>>>
>>>> Also, to evaluate the performance of solr join plugin vs lucene
>>>> joinutil, I filed same join query against same data-set and same schema
>>>> and in the results, I am always seeing the Qtime for Solr much lower
>>>> then lucenes. What is the reason behind this ?  Solr doesn't return
>>>> scores could that cause so much difference ?
>>>>
>>>> My guess is solr has very sophisticated caching mechanism and that might
>>>> be coming in play, is that true ? or there's difference in the way JOIN
>>>> happens in the 2 approach.
>>>>
>>>> If I understand correctly both the implementation are using 2 pass
>>>> approach - first all the terms from fromField and then returns all
>>>> documents that have matching terms in a toField
>>>>
>>>> If somebody can throw some light, would highly appreciate.
>>>>
>>>> Thanks,
>>>> Anand
>>>
>>>
>>>
>>>
>>> -----
>>>   Author:
>>> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>

Re: Join Scoring

Reply via email to