Ah, got it now - thanks for the explanation.
On Sat, Sep 28, 2013 at 3:33 AM, Upayavira <u...@odoko.co.uk> wrote: > The thing here is to understand how a join works. > > Effectively, it does the inner query first, which results in a list of > terms. It then effectively does a multi-term query with those values. > > q=size:large {!join fromIndex=other from=someid > to=someotherid}type:shirt > > Imagine the inner join returned values A,B,C. Your inner query is, on > core 'other', q=type:shirt&fl=someid. > > Then your outer query becomes size:large someotherid:(A B C) > > Your inner query returns 25k values. You're having to do a multi-term > query for 25k terms. That is *bound* to be slow. > > The pseudo-joins in Solr 4.x are intended for a small to medium number > of values returned by the inner query, otherwise performance degrades as > you are seeing. > > Is there a way you can reduce the number of values returned by the inner > query? > > As Joel mentions, those other joins are attempts to find other ways to > work with this limitation. > > Upayavira > > On Fri, Sep 27, 2013, at 09:44 PM, Peter Keegan wrote: > > Hi Joel, > > > > I tried this patch and it is quite a bit faster. Using the same query on > > a > > larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin' > > QTime was 100 msec! This was for true for large and small result sets. > > > > A few notes: the patch didn't compile with 4.3 because of the > > SolrCore.getLatestSchema call (which I worked around), and the package > > name > > should be: > > <queryParser name="hjoin" > > class="org.apache.solr.search.joins.HashSetJoinQParserPlugin"/> > > > > Unfortunately, I just learned that our uniqueKey may have to be an > > alphanumeric string instead of an int, so I'm not out of the woods yet. > > > > Good stuff - thanks. > > > > Peter > > > > > > On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > It looks like you are using int join keys so you may want to check out > > > SOLR-4787, specifically the hjoin and bjoin. > > > > > > These perform well when you have a large number of results from the > > > fromIndex. If you have a small number of results in the fromIndex the > > > standard join will be faster. > > > > > > > > > On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan <peterlkee...@gmail.com > > > >wrote: > > > > > > > I forgot to mention - this is Solr 4.3 > > > > > > > > Peter > > > > > > > > > > > > > > > > On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan < > peterlkee...@gmail.com > > > > >wrote: > > > > > > > > > I'm doing a cross-core join query and the join query is 30X slower > than > > > > > each of the 2 individual queries. Here are the queries: > > > > > > > > > > Main query: > http://localhost:8983/solr/mainindex/select?q=title:java > > > > > QTime: 5 msec > > > > > hit count: 1000 > > > > > > > > > > Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1TO > > > > 0.3] > > > > > QTime: 4 msec > > > > > hit count: 25K > > > > > > > > > > Join query: > > > > > > > > > > > > > http://localhost:8983/solr/mainindex/select?q=title:java&fq={!joinfromIndex=mainindextoIndex=subindexfrom=docidto=docid}fld1:[0.1 > TO 0.3] > > > > > QTime: 160 msec > > > > > hit count: 205 > > > > > > > > > > Here are the index spec's: > > > > > > > > > > mainindex size: 117K docs, 1 segment > > > > > mainindex schema: > > > > > <field name="docid" type="int" indexed="true" stored="true" > > > > > required="true" multiValued="false" /> > > > > > <field name="title" type="text_en_splitting" indexed="true" > > > > > stored="true" multiValued="false" /> > > > > > <uniqueKey>docid</uniqueKey> > > > > > > > > > > subindex size: 117K docs, 1 segment > > > > > subindex schema: > > > > > <field name="docid" type="int" indexed="true" stored="true" > > > > > required="true" multiValued="false" /> > > > > > <field name="fld1" type="float" indexed="true" stored="true" > > > > > required="false" multiValued="false" /> > > > > > <uniqueKey>docid</uniqueKey> > > > > > > > > > > With debugQuery=true I see: > > > > > "debug":{ > > > > > "join":{ > > > > > "{!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO > > > 0.3]":{ > > > > > "time":155, > > > > > "fromSetSize":24742, > > > > > "toSetSize":24742, > > > > > "fromTermCount":117810, > > > > > "fromTermTotalDf":117810, > > > > > "fromTermDirectCount":117810, > > > > > "fromTermHits":24742, > > > > > "fromTermHitsTotalDf":24742, > > > > > "toTermHits":24742, > > > > > "toTermHitsTotalDf":24742, > > > > > "toTermDirectCount":24627, > > > > > "smallSetsDeferred":115, > > > > > "toSetDocsAdded":24742}}, > > > > > > > > > > Via profiler and debugger, I see 150 msec spent in the outer > > > > > 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This > seems > > > > like a > > > > > lot of time to join the bitsets. Does this seem right? > > > > > > > > > > Peter > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Joel Bernstein > > > Professional Services LucidWorks > > > >