Re: unique key accross collections within datacenter

Bernd Fehling Wed, 13 May 2020 05:25:20 -0700

Thanks Eric for your answer.

I was thinking to complex and seeing problems which are not there.


I have your second scenario. The first huge collection still remains
and will grow further while the second will start with same schema but
content from a new source. Sure I could also load the content
from the new source into the first huge collection but I want to
have source, loading, maintenance handling separated.
May be I also start the new collection with a new instance.

Regards
Bernd

Am 13.05.20 um 13:40 schrieb Erick Erickson:
> So a doc in your new collection is expected to supersede a doc
> with the same ID in the old one, right? 
> 
> What I’d do is delete the IDs from my old collection as they were added to
> the new one, there’s not much use in keeping both if you always want
> the new one.
> 
> Let’s assume you do this, the next issue is making sure all of your docs in 
> the new collection are deleted from the old one, and your process will
> inevitably have a hiccough or two. You could periodically use streaming to 
> produce a list of IDs common to both collections, and have a cleanup
> process you occasionally ran to make up for any glitches in the normal
> delete-from-the-old-collection process, see:
> https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#stream-decorators
> 
> If that’s not the case, then having the same id in the different collections
> doesn’t matter. Solr doesn’t use the ID for combining results, just routing 
> and
> then updating.
> 
> This is illustrated by the fact that, through user error, you can even get 
> the same
> document repeated in a result set if it gets indexed to two different shards.
> 
> And if neither of those are on target, what about “handling” unique IDs across
> two collections do you think might go wrong?
> 
> Best,
> Erick
> 
>> On May 13, 2020, at 4:26 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> 
>> wrote:
>>
>> Dear list,
>>
>> in my SolrCloud 6.6 I have a huge collection and now I will get
>> much more data from a different source to be indexed.
>> So I'm thinking about a new collection and combine both, the existing
>> one and the new one with an alias.
>>
>> But how to handle the unique key accross collections within a datacenter?
>> Is it at all possible?
>>
>> I don't see any problems with add, update and delete of documents because
>> these operations are not using the alias.
>>
>> But searching accross collections with alias and then fetching documents
>> by id from the result may lead to results where the id is in both 
>> collections?
>>
>> I have no idea, but there are SolrClouds with a lot of collections out there.
>> How do they handle uniqueness accross collections within a datacenter?
>>
>> Regards
>> Bernd
>

Re: unique key accross collections within datacenter

Reply via email to