On 8/5/2014 7:20 AM, Jako de Wet wrote: > I have a Solr Index that has 20+ million products, the core is about 70GB. > > What I would like to do, is a weekly delta-import, but it seems to be > growing in size each week. (Currently its running a full-import + > clean=false) > > Shouldn't the Delta-Import with the Clean=True option import the records > and update the old records in the core? It should result in +- the same > size? > > When I do a delta-import + clean=true via the Solr Dashboard, it cleans the > whole 20+million and only the update records are left.
The "clean" parameter refers to the whole index. You asked it to clean the index, so it did -- it deleted all documents. Deleted documents are not actually deleted, they are marked as deleted -- they still take up disk space. In order to actually get rid of them, they need to be merged out. When segments are merged, only the non-deleted documents are copied to the new segment. A full optimize (which is a forced merge down to one segment) is the only way to be absolutely sure that all deleted documents are gone. A full optimize will completely rewrite the index, which is a lot of disk I/O. That can lead to query performance issues while the optimize is happening and for a short time afterwards. Note that when you index a document with the same value in the uniqueKey field as an existing document, the old document is deleted before the new one is indexed. Thanks, Shawn