Presumably you can find a duplicated shard per document - there's a result transformer for this. Then, you can send a delete request to a particular core disabling distributed processing distrib=false Never did anything like that.
On Sun, Sep 22, 2024 at 2:17 AM Rachid Bouacheria < rachid.bouache...@expeditors.com> wrote: > Hi All, > > > > We have a solr collection that has 3 repicas and 2 shards. > > > > After migrating the solr cluster from linux 6 to linux 8 the cluster > looked healthy, but we realized that it wasn’t. > > Documents that were posted to the collection while the solr cluster was > not healthy allowed duplicate. > > We think that a document with id 1 ended up on the wrong shard. As though > the hashing of the id persisted the document on the wrong shard. So instead > of updating the document on shard 1 it created a new version on shard 2. > > > > We can query both documents and see the duplicate data. But we are unable > to delete one of the document. If we delete the document with the id of > the document then both documents are deleted. We can give an attribute > (another id) besides the document id to only delete the older version (and > only keep the most recent update) but the delete doesn’t seem to care, and > still deletes both documents. > > > > We are using solr 4.10.4 and it doesn’t seem like there are tools to help > us with this version. > > > > Any help would be appreciated > > > > > > > > *Rachid Bouacheria* > > Senior Developer, IS Operational Experience, Legacy > > > > Sterling Plaza 2 > > 3545 Factoria Blvd SE > > Bellevue, WA 98006 > > > > > > *Global Headquarters, Seattle* > > 1015 Third Avenue > > Seattle, WA 98104 > > > -- Sincerely yours Mikhail Khludnev