So a doc in your new collection is expected to supersede a doc
with the same ID in the old one, right? 

What I’d do is delete the IDs from my old collection as they were added to
the new one, there’s not much use in keeping both if you always want
the new one.

Let’s assume you do this, the next issue is making sure all of your docs in 
the new collection are deleted from the old one, and your process will
inevitably have a hiccough or two. You could periodically use streaming to 
produce a list of IDs common to both collections, and have a cleanup
process you occasionally ran to make up for any glitches in the normal
delete-from-the-old-collection process, see:
https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#stream-decorators

If that’s not the case, then having the same id in the different collections
doesn’t matter. Solr doesn’t use the ID for combining results, just routing and
then updating.

This is illustrated by the fact that, through user error, you can even get the 
same
document repeated in a result set if it gets indexed to two different shards.

And if neither of those are on target, what about “handling” unique IDs across
two collections do you think might go wrong?

Best,
Erick

> On May 13, 2020, at 4:26 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> 
> wrote:
> 
> Dear list,
> 
> in my SolrCloud 6.6 I have a huge collection and now I will get
> much more data from a different source to be indexed.
> So I'm thinking about a new collection and combine both, the existing
> one and the new one with an alias.
> 
> But how to handle the unique key accross collections within a datacenter?
> Is it at all possible?
> 
> I don't see any problems with add, update and delete of documents because
> these operations are not using the alias.
> 
> But searching accross collections with alias and then fetching documents
> by id from the result may lead to results where the id is in both collections?
> 
> I have no idea, but there are SolrClouds with a lot of collections out there.
> How do they handle uniqueness accross collections within a datacenter?
> 
> Regards
> Bernd

Reply via email to