So a doc in your new collection is expected to supersede a doc with the same ID in the old one, right?
What I’d do is delete the IDs from my old collection as they were added to the new one, there’s not much use in keeping both if you always want the new one. Let’s assume you do this, the next issue is making sure all of your docs in the new collection are deleted from the old one, and your process will inevitably have a hiccough or two. You could periodically use streaming to produce a list of IDs common to both collections, and have a cleanup process you occasionally ran to make up for any glitches in the normal delete-from-the-old-collection process, see: https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#stream-decorators If that’s not the case, then having the same id in the different collections doesn’t matter. Solr doesn’t use the ID for combining results, just routing and then updating. This is illustrated by the fact that, through user error, you can even get the same document repeated in a result set if it gets indexed to two different shards. And if neither of those are on target, what about “handling” unique IDs across two collections do you think might go wrong? Best, Erick > On May 13, 2020, at 4:26 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > wrote: > > Dear list, > > in my SolrCloud 6.6 I have a huge collection and now I will get > much more data from a different source to be indexed. > So I'm thinking about a new collection and combine both, the existing > one and the new one with an alias. > > But how to handle the unique key accross collections within a datacenter? > Is it at all possible? > > I don't see any problems with add, update and delete of documents because > these operations are not using the alias. > > But searching accross collections with alias and then fetching documents > by id from the result may lead to results where the id is in both collections? > > I have no idea, but there are SolrClouds with a lot of collections out there. > How do they handle uniqueness accross collections within a datacenter? > > Regards > Bernd