Presumably you can find a duplicated shard per document - there's a result
transformer for this.
Then, you can send a delete request to a particular core disabling
distributed processing distrib=false
Never did anything like that.

On Sun, Sep 22, 2024 at 2:17 AM Rachid Bouacheria <
rachid.bouache...@expeditors.com> wrote:

> Hi All,
>
>
>
> We have a solr collection that has 3 repicas and 2 shards.
>
>
>
> After migrating the solr cluster from linux 6 to linux 8 the cluster
> looked healthy, but we realized that it wasn’t.
>
> Documents that were posted to the collection while the solr cluster was
> not healthy allowed duplicate.
>
> We think that a document with id 1 ended up on the wrong shard. As though
> the hashing of the id persisted the document on the wrong shard. So instead
> of updating the document on shard 1 it created a new version on shard 2.
>
>
>
> We can query both documents and see the duplicate data. But we are unable
> to delete one of the document. If we delete the document  with the id of
> the document then both documents are deleted. We can give an attribute
> (another id) besides the document id to only delete the older version (and
> only keep the most recent update) but the delete doesn’t seem to care, and
> still deletes both documents.
>
>
>
> We are using solr 4.10.4 and it doesn’t seem like there are tools to help
> us with this version.
>
>
>
> Any help would be appreciated
>
>
>
>
>
>
>
> *Rachid Bouacheria*
>
> Senior Developer, IS Operational Experience, Legacy
>
>
>
> Sterling Plaza 2
>
> 3545 Factoria Blvd SE
>
> Bellevue, WA 98006
>
>
>
>
>
> *Global Headquarters, Seattle*
>
> 1015 Third Avenue
>
> Seattle, WA  98104
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
  • Solr Dupe data Rachid Bouacheria
    • Re: Solr Dupe data Mikhail Khludnev

Reply via email to