Thanks Eric for your answer. I was thinking to complex and seeing problems which are not there.
I have your second scenario. The first huge collection still remains and will grow further while the second will start with same schema but content from a new source. Sure I could also load the content from the new source into the first huge collection but I want to have source, loading, maintenance handling separated. May be I also start the new collection with a new instance. Regards Bernd Am 13.05.20 um 13:40 schrieb Erick Erickson: > So a doc in your new collection is expected to supersede a doc > with the same ID in the old one, right? > > What I’d do is delete the IDs from my old collection as they were added to > the new one, there’s not much use in keeping both if you always want > the new one. > > Let’s assume you do this, the next issue is making sure all of your docs in > the new collection are deleted from the old one, and your process will > inevitably have a hiccough or two. You could periodically use streaming to > produce a list of IDs common to both collections, and have a cleanup > process you occasionally ran to make up for any glitches in the normal > delete-from-the-old-collection process, see: > https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#stream-decorators > > If that’s not the case, then having the same id in the different collections > doesn’t matter. Solr doesn’t use the ID for combining results, just routing > and > then updating. > > This is illustrated by the fact that, through user error, you can even get > the same > document repeated in a result set if it gets indexed to two different shards. > > And if neither of those are on target, what about “handling” unique IDs across > two collections do you think might go wrong? > > Best, > Erick > >> On May 13, 2020, at 4:26 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> >> wrote: >> >> Dear list, >> >> in my SolrCloud 6.6 I have a huge collection and now I will get >> much more data from a different source to be indexed. >> So I'm thinking about a new collection and combine both, the existing >> one and the new one with an alias. >> >> But how to handle the unique key accross collections within a datacenter? >> Is it at all possible? >> >> I don't see any problems with add, update and delete of documents because >> these operations are not using the alias. >> >> But searching accross collections with alias and then fetching documents >> by id from the result may lead to results where the id is in both >> collections? >> >> I have no idea, but there are SolrClouds with a lot of collections out there. >> How do they handle uniqueness accross collections within a datacenter? >> >> Regards >> Bernd >