Partial updates on collection with router.field lead to duplicated index

Zhivko Donev Fri, 06 Nov 2020 03:02:46 -0800

Hi All,

I believe that this is a bug on solr side but want to be sure before filing
a JIRA ticket.
Setup:
Solr Cloud 8.3
Collection with 2 shards, 2 replicas, router = compositeId,
router.field=routerField_s


I am adding a document and then updating it as follows:

{
"id":"1",
"routerField_s":"1"
}
-----
/update?*_route_=1*
[{
"id":"1",
"routerField_s":"1",
"test_s":{"set":"1"}
}]
--
/update?*_route_=1*
[{
"id":"1",
"routerField_s":"1",
"test_s":{"set":"2"}
}]
--
/update?*_route_=1*
[{
"id":"1",
"routerField_s":"1",
"test_s":{"set":"3"}
}]

When I query the collection for document with id:1 and limit = 10 all seems
to be fine. However if I query with limit 1 the response is saying
numFound=4 (indicating duplicated index).
Moreover if I query the added field test_s for particular value I will get
matches for all of the updated values - 1,2 and 3

If I execute the update without the _route_ param everything seems to work
properly - can someone confirm this?
The same behaviour can be observed if I have the following for the
routerField_s:
"routerField_s":{"set":"1"}

If I try to update with just _route_ param and "id" inside the update body
the request is rejected stating that the "routerField_s" is missing and no
shard can be identified. This seems like expected behaviour.
At a bare minimum I believe that the documentation for updating parts of
the document should be updated with examples how to handle cases like this.
Ideally I would expect solr to reject any requests containing both _route_
param and "routerField_s" values as well as using the {"set":"value"} for
the "routerField_s".

And final question - Do I have any other options for fixing the duplicated
index beside:
1. Delete documents by query "id:{corrupted_id}", then add the document
again
2. Do a full reload to a new collection and switch to using it.

Any thoughts will be much appreciated.

Partial updates on collection with router.field lead to duplicated index

Reply via email to