You could give a try for this join contrib patch

https://issues.apache.org/jira/browse/SOLR-4787



On Mon, Mar 2, 2015 at 12:04 PM, Matt B <mat...@runbox.com> wrote:

> I've recently inherited a Solr instance that is required to perform
> numerous joins between two cores, usually as filter queries, similar to the
> one below:
>
> q=firstName=Matt&fq=-({!to=emailAddress toIndex=accounts type=join
> fromIndex=lists from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce
> OR {!to=emailDomain toIndex=accounts type=join fromIndex=lists
> from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce OR
> {!to=emailDomainReversed toIndex=accounts type=join fromIndex=lists
> from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce)
>
> The accounts core is about 35GB with ~40,000,000 documents and the lists
> core is about 9 GB with 90,0000,000 documents.  There may be anywhere from
> one to one million documents in the lists core matching any particular
> list_id.  The idea is to filter a search query on the accounts core to
> include or exclude any documents with an email address, email domain, or
> reverse email domain that is found within the lists core for a particular
> list id.  The lists core is frequently updated on a daily basis with both
> additions and deletions.
>
> Not surprisingly, such queries are very slow, usually taking minutes to
> return any results.
>
> Are there any possible strategies to significantly increase the
> performance of such queries?  The JVM max heap size is set to 16 GB and the
> server has 64 GB RAM.

Reply via email to