Per: Given that you said that the field redefinition also includes routing info.... I don't see any other way than re-indexing each collection. That said, could you use the collection aliasing and do one collection at a time?
Best, Erick On Tue, Jul 22, 2014 at 11:45 PM, Per Steffensen <st...@designware.dk> wrote: > Hi > > We have numerous collections each with numerous shards spread across > numerous machines. We just discovered that all documents have a field with > a wrong value and besides that we would like to add a new field to all > documents > * The field with the wrong value is a long, DocValued, Indexed and Stored. > Some (about half) of the documents need to have a constant added to their > current value > * The field we want to add will be and int, DocValued, Indexed and Stored. > Needs to be added to all documents, but will have different values among > the documents > > How to achieve our goal in the easiest possible way? > > We thought about spooling/streaming from the existing collection into a > "twin"-collection, then delete the existing collection and finally rename > the "twin"-collection to have the same name as the original collection. > Basically indexing all documents again. If that is the easiest way, how do > we query in a way so that we get all documents streamed. We cannot just do > a *:* query that returns everything into memory and the index from there, > because we have billions of documents (not enough memory). Please note that > we are on 4.4, which does not contain the new CURSOR-feature. Please also > note that speed is an important factor for us. > > Guess this could also be achieved by doing 1-1 migration on shard-level > instead of collection-level, keeping everything in the new collections on > the same machine as where they lived in the old collections. That could > probably complete faster than the 1-1 on collection-level approach. But > this 1-1 on shard-level approach is not very good for us, because the long > field we need to change is also part of the id (controlling the routing to > a particular shard) and therefore actually we also need to change the id on > all documents. So if we do the 1-1 on shard-level approach, we will end up > having documents in shards that they actually do not be to (they would not > have been routed there by the routing system in Solr). We might be able to > live with this disadvantage if 1-1 on shard-level can be easily achieved > much faster than the 1-1 on collection-level. > > Any input is very much appreciated! Thanks > > Regards, Per Steffensen >