A couple of other things: 1) Your innerJoin can parallelized across workers to improve performance. Take a look at the docs on the parallel function for the details.
2) It looks like you might be doing graph operations with joins. You might to take a look at the gatherNodes function coming in 6.1: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62693238 Joel Bernstein http://joelsolr.blogspot.com/ On Fri, May 13, 2016 at 5:57 PM, Joel Bernstein <joels...@gmail.com> wrote: > When doing things that require all the results (like joins) you need to > specify the /export handler in the search function. > > qt="/export" > > The search function defaults to the /select handler which is designed to > return the top N results. The /export handler always returns all results > that match the query. Also keep in mind that the /export handler requires > that sort fields and fl fields have docValues set. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, May 13, 2016 at 5:36 PM, Ryan Cutter <ryancut...@gmail.com> wrote: > >> Question #1: >> >> triple_type collection has a few hundred docs and triple has 25M docs. >> >> When I search for a particular subject_id in triple which I know has 14 >> results and do not pass in 'rows' params, it returns 0 results: >> >> innerJoin( >> search(triple, q=subject_id:1656521, >> fl="triple_id,subject_id,type_id", >> sort="type_id asc"), >> search(triple_type, q=*:*, fl="triple_type_id,triple_type_label", >> sort="triple_type_id asc"), >> on="type_id=triple_type_id" >> ) >> >> When I do the same search with rows=10000, it returns 14 results: >> >> innerJoin( >> search(triple, q=subject_id:1656521, >> fl="triple_id,subject_id,type_id", >> sort="type_id asc", rows=10000), >> search(triple_type, q=*:*, fl="triple_type_id,triple_type_label", >> sort="triple_type_id asc", rows=10000), >> on="type_id=triple_type_id" >> ) >> >> Am I doing this right? Is there a magic number to pass into rows which >> says "give me all the results which match this query"? >> >> >> Question #2: >> >> Perhaps related to the first question but I want to run the innerJoin() >> without the subject_id - rather have it use the results of another query. >> But this does not return any results. I'm saying "search for this entity >> based on id then use that result's entity_id as the subject_id to look >> through the triple/triple_type collections: >> >> hashJoin( >> innerJoin( >> search(triple, q=*:*, fl="triple_id,subject_id,type_id", >> sort="type_id asc"), >> search(triple_type, q=*:*, fl="triple_type_id,triple_type_label", >> sort="triple_type_id asc"), >> on="type_id=triple_type_id" >> ), >> hashed=search(entity, >> q=id:"urn:sid:entity:455dfa1aa27eedad21ac2115797c1580bb3b3b4e", >> fl="entity_id,entity_label", sort="entity_id asc"), >> on="subject_id=entity_id" >> ) >> >> Am I using doing this hashJoin right? >> >> Thanks very much, Ryan >> > >