On 8/5/2014 7:20 AM, Jako de Wet wrote:
> I have a Solr Index that has 20+ million products, the core is about 70GB.
> 
> What I would like to do, is a weekly delta-import, but it seems to be
> growing in size each week. (Currently its running a full-import +
> clean=false)
> 
> Shouldn't the Delta-Import with the Clean=True option import the records
> and update the old records in the core? It should result in +- the same
> size?
> 
> When I do a delta-import + clean=true via the Solr Dashboard, it cleans the
> whole 20+million and only the update records are left.

The "clean" parameter refers to the whole index.  You asked it to clean
the index, so it did -- it deleted all documents.

Deleted documents are not actually deleted, they are marked as deleted
-- they still take up disk space.  In order to actually get rid of them,
they need to be merged out.  When segments are merged, only the
non-deleted documents are copied to the new segment.  A full optimize
(which is a forced merge down to one segment) is the only way to be
absolutely sure that all deleted documents are gone.  A full optimize
will completely rewrite the index, which is a lot of disk I/O.  That can
lead to query performance issues while the optimize is happening and for
a short time afterwards.

Note that when you index a document with the same value in the uniqueKey
field as an existing document, the old document is deleted before the
new one is indexed.

Thanks,
Shawn

Reply via email to