You could give a try for this join contrib patch https://issues.apache.org/jira/browse/SOLR-4787
On Mon, Mar 2, 2015 at 12:04 PM, Matt B <mat...@runbox.com> wrote: > I've recently inherited a Solr instance that is required to perform > numerous joins between two cores, usually as filter queries, similar to the > one below: > > q=firstName=Matt&fq=-({!to=emailAddress toIndex=accounts type=join > fromIndex=lists from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce > OR {!to=emailDomain toIndex=accounts type=join fromIndex=lists > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce OR > {!to=emailDomainReversed toIndex=accounts type=join fromIndex=lists > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce) > > The accounts core is about 35GB with ~40,000,000 documents and the lists > core is about 9 GB with 90,0000,000 documents. There may be anywhere from > one to one million documents in the lists core matching any particular > list_id. The idea is to filter a search query on the accounts core to > include or exclude any documents with an email address, email domain, or > reverse email domain that is found within the lists core for a particular > list id. The lists core is frequently updated on a daily basis with both > additions and deletions. > > Not surprisingly, such queries are very slow, usually taking minutes to > return any results. > > Are there any possible strategies to significantly increase the > performance of such queries? The JVM max heap size is set to 16 GB and the > server has 64 GB RAM.