We are actually very close to doing what Shawn has suggested. Emir has a good point about new collections failing on deletes/updates of older documents which were not present in the new collection. But even if this feature can be implemented for an append-only log, it would make a good feature IMO.
Use-case for re-indexing everything again is generally that of an attribute change like enabling "indexed" or "docValues" on a field or adding a new field to a schema. While the reading client-code sits behind a flag to start using the new attribute/field, we have to re-index all the data without stopping older-format reads. Currently, we have to do dual writes to the new collections or play catch-up-after-a-bootstrap. Note that the catch-up-after-a-bootstrap is not very easy too (it is very similar to the one described by Shwan). If this special place is Kafka or some table in the DB, then we have to do dual writes to the regular source-of-truth and this special place. Dual writes with DB and Kafka suffer from being transaction-less (and thus lack consistency) while dual write to DB increase the load on DB. Having created_date / modified_date fields and querying the DB to find live-traffic documents has its own problems and is taxing on the DB again. Dual writes to Solr's multiple collections directly is the simplest to implement for a client and that is exactly what this new feature could be. With a dual-write-collection-alias, it becomes easier for the client to not implement any of the above if the dual-write-collection-alias does the following: - Deletes on missing documents in new collection are simply ignored. - Incremental updates just throw an error for not being supported on multi-write-collection-alias. - Regular updates (i.e. Delete-Then-Insert) should work just fine because they will just treat the document as a brand new one and versioning strategies can take care of out-of-order updates. SG On Fri, Nov 10, 2017 at 6:33 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > This approach could work only if it is append only index. In case you have > updates/deletes, you have to process in order, otherwise you will get > incorrect results. I am thinking that is one of the reasons why it might > not be supported since not too useful. > > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 9 Nov 2017, at 19:09, S G <sg.online.em...@gmail.com> wrote: > > > > Hi, > > > > We have a use-case to re-create a solr-collection by re-ingesting > > everything but not tolerate a downtime while that is happening. > > > > We are using collection alias feature to point to the new collection when > > it has been re-ingested fully. > > > > However, re-ingestion takes several hours to complete and during that > time, > > the customer has to write to both the collections - previous collection > and > > the one being bootstrapped. > > This dual-write is harder to do from the client side (because client > needs > > to have a retry logic to ensure any update does not succeed in one > > collection and fails in another - consistency problem) and it would be a > > real welcome addition if collection aliasing can support this. > > > > Proposal: > > If can enhance the write alias to point to multiple collections such that > > any update to the alias is written to all the collections it points to, > it > > would help the client to avoid dual writes and also issue just a single > > http call from the client instead of multiple. It would also reduce the > > retry logic inside the client code used to keep the collections > consistent. > > > > > > Thanks > > SG > >