Hi

We have numerous collections each with numerous shards spread across numerous machines. We just discovered that all documents have a field with a wrong value and besides that we would like to add a new field to all documents * The field with the wrong value is a long, DocValued, Indexed and Stored. Some (about half) of the documents need to have a constant added to their current value * The field we want to add will be and int, DocValued, Indexed and Stored. Needs to be added to all documents, but will have different values among the documents

How to achieve our goal in the easiest possible way?

We thought about spooling/streaming from the existing collection into a "twin"-collection, then delete the existing collection and finally rename the "twin"-collection to have the same name as the original collection. Basically indexing all documents again. If that is the easiest way, how do we query in a way so that we get all documents streamed. We cannot just do a *:* query that returns everything into memory and the index from there, because we have billions of documents (not enough memory). Please note that we are on 4.4, which does not contain the new CURSOR-feature. Please also note that speed is an important factor for us.

Guess this could also be achieved by doing 1-1 migration on shard-level instead of collection-level, keeping everything in the new collections on the same machine as where they lived in the old collections. That could probably complete faster than the 1-1 on collection-level approach. But this 1-1 on shard-level approach is not very good for us, because the long field we need to change is also part of the id (controlling the routing to a particular shard) and therefore actually we also need to change the id on all documents. So if we do the 1-1 on shard-level approach, we will end up having documents in shards that they actually do not be to (they would not have been routed there by the routing system in Solr). We might be able to live with this disadvantage if 1-1 on shard-level can be easily achieved much faster than the 1-1 on collection-level.

Any input is very much appreciated! Thanks

Regards, Per Steffensen

Reply via email to