Thanks all for the suggestions. Regarding patch SOLR-4787, it seems like this will only work with long or int fields and not strings like email addresses. But my coworker suggested the possibility of using a hash to generate long fields from the string fields, so I may try that out.
-Matt On Mon, 2 Mar 2015 23:16:33 -0700, William Bell <billnb...@gmail.com> wrote: > I agree that join is slow. Adding fq on LocalParams is good. Has this been > added to {!lucene} and other calls like join ? > > > > On Mon, Mar 2, 2015 at 2:00 PM, Gopal Patwa <gopalpa...@gmail.com> wrote: > > > You could give a try for this join contrib patch > > > > https://issues.apache.org/jira/browse/SOLR-4787 > > > > > > > > On Mon, Mar 2, 2015 at 12:04 PM, Matt B <mat...@runbox.com> wrote: > > > > > I've recently inherited a Solr instance that is required to perform > > > numerous joins between two cores, usually as filter queries, similar to > > the > > > one below: > > > > > > q=firstName=Matt&fq=-({!to=emailAddress toIndex=accounts type=join > > > fromIndex=lists > > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce > > > OR {!to=emailDomain toIndex=accounts type=join fromIndex=lists > > > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce OR > > > {!to=emailDomainReversed toIndex=accounts type=join fromIndex=lists > > > from=listValue}list_id:000038f2-351b-11e4-9579-001e67654bce) > > > > > > The accounts core is about 35GB with ~40,000,000 documents and the lists > > > core is about 9 GB with 90,0000,000 documents. There may be anywhere > > from > > > one to one million documents in the lists core matching any particular > > > list_id. The idea is to filter a search query on the accounts core to > > > include or exclude any documents with an email address, email domain, or > > > reverse email domain that is found within the lists core for a particular > > > list id. The lists core is frequently updated on a daily basis with both > > > additions and deletions. > > > > > > Not surprisingly, such queries are very slow, usually taking minutes to > > > return any results. > > > > > > Are there any possible strategies to significantly increase the > > > performance of such queries? The JVM max heap size is set to 16 GB and > > the > > > server has 64 GB RAM. > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076