I've run into this also; it is a key difference between a master-slave
setup and a solrCloud setup.

clean=true has always deleted the index on the first commit, but in older
versions of Solr, the workaround was to disable replication until the full
reindex had completed.

This is a convenient practice for a number of reasons, especially for small
indices.  It really isn't supported in SolrCloud, because of the difference
in how writes are processed for Master/Slave vs. SolrCloud.  With a
Master/Slave setup, all writes are going to the same location, so disabling
replication lets you buffer them up all in one go.   With a SolrCloud
setup,  the data is distributed across the nodes in the cluster.  So it
would need to know to blow away at the 'master' node for each shard to
support the 'clean', serve traffic from the slaves only for each shard,
until the re-index completes, do the replications, and then resume normal
operation.

Note that in Solr 7.x if you revert to the master/slave setup, you need to
disable polling at the slaves.  Disabling replication at the master will
also cause index deletion at the slaves (SOLR-11938).

Elizabeth

On Tue, Feb 12, 2019 at 11:42 AM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi!
> If clean=true then index will be replaced completely by the new import.
> That is how it is supposed to work.
> If you don't want preemptively delete your index set &clean=false. And set
> &commit=true instead of &optimize=true
> Are you sure about optimize? Do you really need it? Usually it's very
> costly.
> So, I'd try:
> dataimport?command=full-import&clean=false&commit=true
>
> If nevertheless nothing imported, please check the log
> --
> Vadim
>
>
>
> > -----Original Message-----
> > From: Joakim Hansson [mailto:joakim.hansso...@gmail.com]
> > Sent: Tuesday, February 12, 2019 12:47 PM
> > To: solr-user@lucene.apache.org
> > Subject: What's the deal with dataimporthandler overwriting indexes?
> >
> > Hi!
> > We are currently upgrading from solr 6.2 master slave setup to solr 7.6
> > running solrcloud.
> > I dont know if I've missed something really trivial, but everytime I
> start
> > a full import (dataimport?command=full-import&clean=true&optimize=true)
> > the
> > old index gets overwritten by the new import.
> >
> > In 6.2 this wasn't really a problem since I could disable replication in
> > the API on the master and enable it once the import was completed.
> > With 7.6 and solrcloud we use NRT-shards and replicas since those are the
> > only ones that support rule-based replica placement and whenever I start
> a
> > new import the old index is overwritten all over the solrcloud cluster.
> >
> > I have tried changing to clean=false, but that makes the import finish
> > without adding any docs.
> > Doesn't matter if I use soft or hard commits.
> >
> > I don't get the logic in this. Why would you ever want to delete an
> > existing index before there is a new one in place? What is it I'm missing
> > here?
> >
> > Please enlighten me.
>
>

Reply via email to