On 10/22/2018 1:26 PM, Chris Ulicny wrote:
There weren't any particular problems we ran into since the client that
makes the queries to multiple collections previously would query multiple
cores using the 'shards' parameter before we moved to solrcloud. We didn't
have any complicated sorting or scoring requirements fortunately.

The one thing I remember looking into was what solr would do when two
documents with the same id were found in both collections. I believe it
just non-deterministically picked one, probably the one that came in first
or last.

Yes, that is how it works.  I do not know whether it is the first one to respond or the last one to respond that ends up in the results.  Solr is designed to work with data where the uniqueKey field really is unique across everything that is being queried.  Results can vary when you have the same uniqueKey value in more than one place and you query both of them at once.

Depending on how many collections you need to query simultaneously, it's
worth looking into using aliases for lists of collections as Alex
mentioned.

Unfortunately, in our use case, it wasn't worth the headache of managing
aliases for every possible combination of collections that needed to be
queried, but we would have preferred to use aliases.

Aliases are the cleanest option.  This syntax also works, sorta blew my mind when somebody told me about it:

http://host:port/solr/current,archive2,archive4/select?q=*:*

If you're using a Solr client library, it might not be possible to control the URL like that, but if you're building URLs yourself, you could use it.

I recently filed an issue related to alias handling, some unexpected behavior:

https://issues.apache.org/jira/browse/SOLR-12849

Thanks,
Shawn

Reply via email to