Re: unique key accross collections within datacenter

ART GALLERY Wed, 13 May 2020 09:47:15 -0700

check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219


On Wed, May 13, 2020 at 7:24 AM Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:
>
> Thanks Eric for your answer.
>
> I was thinking to complex and seeing problems which are not there.
>
> I have your second scenario. The first huge collection still remains
> and will grow further while the second will start with same schema but
> content from a new source. Sure I could also load the content
> from the new source into the first huge collection but I want to
> have source, loading, maintenance handling separated.
> May be I also start the new collection with a new instance.
>
> Regards
> Bernd
>
> Am 13.05.20 um 13:40 schrieb Erick Erickson:
> > So a doc in your new collection is expected to supersede a doc
> > with the same ID in the old one, right?
> >
> > What I’d do is delete the IDs from my old collection as they were added to
> > the new one, there’s not much use in keeping both if you always want
> > the new one.
> >
> > Let’s assume you do this, the next issue is making sure all of your docs in
> > the new collection are deleted from the old one, and your process will
> > inevitably have a hiccough or two. You could periodically use streaming to
> > produce a list of IDs common to both collections, and have a cleanup
> > process you occasionally ran to make up for any glitches in the normal
> > delete-from-the-old-collection process, see:
> > https://lucene.apache.org/solr/guide/6_6/stream-decorators.html#stream-decorators
> >
> > If that’s not the case, then having the same id in the different collections
> > doesn’t matter. Solr doesn’t use the ID for combining results, just routing 
> > and
> > then updating.
> >
> > This is illustrated by the fact that, through user error, you can even get 
> > the same
> > document repeated in a result set if it gets indexed to two different 
> > shards.
> >
> > And if neither of those are on target, what about “handling” unique IDs 
> > across
> > two collections do you think might go wrong?
> >
> > Best,
> > Erick
> >
> >> On May 13, 2020, at 4:26 AM, Bernd Fehling 
> >> <bernd.fehl...@uni-bielefeld.de> wrote:
> >>
> >> Dear list,
> >>
> >> in my SolrCloud 6.6 I have a huge collection and now I will get
> >> much more data from a different source to be indexed.
> >> So I'm thinking about a new collection and combine both, the existing
> >> one and the new one with an alias.
> >>
> >> But how to handle the unique key accross collections within a datacenter?
> >> Is it at all possible?
> >>
> >> I don't see any problems with add, update and delete of documents because
> >> these operations are not using the alias.
> >>
> >> But searching accross collections with alias and then fetching documents
> >> by id from the result may lead to results where the id is in both 
> >> collections?
> >>
> >> I have no idea, but there are SolrClouds with a lot of collections out 
> >> there.
> >> How do they handle uniqueness accross collections within a datacenter?
> >>
> >> Regards
> >> Bernd
> >

Re: unique key accross collections within datacenter

Reply via email to