You can also do the innerJoin in parallel across worker nodes using the parallel function:
hashJoin( parallel(workerCollection, innerJoin( search(users, q="*:*", fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345", qt="/export" partitionKeys="userId"), search(reviews, q="*:*", fl="userId, review, score", sort="userId asc", zkHost="zk1:2345", qt="/export" partitionKeys="userId"), on="userId" ), workers="20", zkHost="zk1:2345", sort="userId asc" ), hashed=search(restaurants, q="city:nyc", fl="restaurantId, restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"), on="restaurantId" ) The parallel function will return the tuples from the innerJoin which is performed on 20 workers in this example. The worker nodes will be selected from "workerCollection" which can be any SolrCloud collection with enough nodes. The "partitionKeys" parameter has been added to searches so that results with the same userId are shuffled to the same worker node. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dpg...@gmail.com> wrote: > Mugeesh, > > You can use Streaming Aggregation to provide various types of > cross-collection joins. This is currently available in trunk and will be a > part of Solr 6. > > To follow with your example, let's assume the following setup: > Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345 > Users: avail on machine2:8983 with 2 shards, zk at zk2:2345 > Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345 > > You could send a streaming query to solr that would return all reviews for > restaurants in NYC and include the user's hometown > > hashJoin( > innerJoin( > search(users, q="*:*", fl="userId, full_name, hometown", sort="userId > asc", zkHost="zk2:2345", qt="/export"), > search(reviews, q="*:*", fl="userId, review, score", sort="userId asc", > zkHost="zk1:2345", qt="/export"), > on="userId" > ), > hashed=search(restaurants, q="city:nyc", fl="restaurantId, > restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"), > on="restaurantId" > ) > > Note that the # of shards doesn't matter and doesn't need to be considered > as a part of your query. Were you to send this off to a url for result, > it'd look like this > > http://machine1:8983/solr/users/stream?stream= > <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers > >[the > expression above] > > Additional information about Streaming API, Streaming Aggregation, and > Streaming Expressions can be found at > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, > though this is currently incomplete as a lot of the new features have yet > to be added to the documentation. > > For those interested, joins were added under tickets > https://issues.apache.org/jira/browse/SOLR-7584 and > https://issues.apache.org/jira/browse/SOLR-8188. > > - Dennis > > > On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote: > > > I have create 3 cores on same machine using solrlcoud. > > core: Restaurant,User,Review > > each of core has only 1 shards and 2 replicas. > > > > Question > > 1.) It is possible to use join among 3 of cores on same machine( or > > different machine) > > 2.)I am struggling how to use join among 3 of core in solrlcoud mode. > > > > Client: is not interested to de-normalized data. > > > > Give some suggestion how to solved that problem. > > > > Thanks > > Mugeesh > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > >