Hi
We have numerous collections each with numerous shards spread across
numerous machines. We just discovered that all documents have a field
with a wrong value and besides that we would like to add a new field to
all documents
* The field with the wrong value is a long, DocValued, Indexed and
Stored. Some (about half) of the documents need to have a constant added
to their current value
* The field we want to add will be and int, DocValued, Indexed and
Stored. Needs to be added to all documents, but will have different
values among the documents
How to achieve our goal in the easiest possible way?
We thought about spooling/streaming from the existing collection into a
"twin"-collection, then delete the existing collection and finally
rename the "twin"-collection to have the same name as the original
collection. Basically indexing all documents again. If that is the
easiest way, how do we query in a way so that we get all documents
streamed. We cannot just do a *:* query that returns everything into
memory and the index from there, because we have billions of documents
(not enough memory). Please note that we are on 4.4, which does not
contain the new CURSOR-feature. Please also note that speed is an
important factor for us.
Guess this could also be achieved by doing 1-1 migration on shard-level
instead of collection-level, keeping everything in the new collections
on the same machine as where they lived in the old collections. That
could probably complete faster than the 1-1 on collection-level
approach. But this 1-1 on shard-level approach is not very good for us,
because the long field we need to change is also part of the id
(controlling the routing to a particular shard) and therefore actually
we also need to change the id on all documents. So if we do the 1-1 on
shard-level approach, we will end up having documents in shards that
they actually do not be to (they would not have been routed there by the
routing system in Solr). We might be able to live with this disadvantage
if 1-1 on shard-level can be easily achieved much faster than the 1-1 on
collection-level.
Any input is very much appreciated! Thanks
Regards, Per Steffensen