Re: Query to multiple collections

Shawn Heisey Mon, 22 Oct 2018 16:50:26 -0700

On 10/22/2018 1:26 PM, Chris Ulicny wrote:

There weren't any particular problems we ran into since the client that
makes the queries to multiple collections previously would query multiple
cores using the 'shards' parameter before we moved to solrcloud. We didn't
have any complicated sorting or scoring requirements fortunately.


The one thing I remember looking into was what solr would do when two
documents with the same id were found in both collections. I believe it
just non-deterministically picked one, probably the one that came in first
or last.

Yes, that is how it works. I do not know whether it is the first one torespond or the last one to respond that ends up in the results. Solr isdesigned to work with data where the uniqueKey field really is uniqueacross everything that is being queried. Results can vary when you havethe same uniqueKey value in more than one place and you query both ofthem at once.

Depending on how many collections you need to query simultaneously, it's
worth looking into using aliases for lists of collections as Alex
mentioned.

Unfortunately, in our use case, it wasn't worth the headache of managing
aliases for every possible combination of collections that needed to be
queried, but we would have preferred to use aliases.

Aliases are the cleanest option. This syntax also works, sorta blew mymind when somebody told me about it:


http://host:port/solr/current,archive2,archive4/select?q=*:*

If you're using a Solr client library, it might not be possible tocontrol the URL like that, but if you're building URLs yourself, youcould use it.

I recently filed an issue related to alias handling, some unexpectedbehavior:


https://issues.apache.org/jira/browse/SOLR-12849

Thanks,
Shawn

Re: Query to multiple collections

Reply via email to