Hi Shawn Thanks for the insight. Why the size increase when not specifying the clean parameter then? The PK for the documents remain the same throughout the whole import process.
Should a full optimize combine all the results into one and decrease the physical size of the core? On Tue, Aug 5, 2014 at 3:28 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 8/5/2014 7:20 AM, Jako de Wet wrote: > > I have a Solr Index that has 20+ million products, the core is about > 70GB. > > > > What I would like to do, is a weekly delta-import, but it seems to be > > growing in size each week. (Currently its running a full-import + > > clean=false) > > > > Shouldn't the Delta-Import with the Clean=True option import the records > > and update the old records in the core? It should result in +- the same > > size? > > > > When I do a delta-import + clean=true via the Solr Dashboard, it cleans > the > > whole 20+million and only the update records are left. > > The "clean" parameter refers to the whole index. You asked it to clean > the index, so it did -- it deleted all documents. > > Deleted documents are not actually deleted, they are marked as deleted > -- they still take up disk space. In order to actually get rid of them, > they need to be merged out. When segments are merged, only the > non-deleted documents are copied to the new segment. A full optimize > (which is a forced merge down to one segment) is the only way to be > absolutely sure that all deleted documents are gone. A full optimize > will completely rewrite the index, which is a lot of disk I/O. That can > lead to query performance issues while the optimize is happening and for > a short time afterwards. > > Note that when you index a document with the same value in the uniqueKey > field as an existing document, the old document is deleted before the > new one is indexed. > > Thanks, > Shawn > > -- *Jako de Wet* *Business DevelopmentSpecialist* *SAPnet* 98 Beach Rd, 1st Floor Metropole Plaza Strand 7140 South Africa Phone: +27-21-853-3564 Fax: +27-21-853-3479 Website: www.sapnet.co.za E-mail: j...@sapnet.co.za [image: http://www.sapnet.co.za/sapnet_logo.png] This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The contents of this e-mail are the opinion of the writer only and are not endorsed by SAPnet unless expressly stated otherwise. All information contained in this transmission (attachments included) is the property of Publications Network (Pty) Ltd t/a SAPnet and protected under Copyright © 2005 by Publications Network (Pty) Ltd t/a SAPnet. SAPnet reserve all rights and unauthorized reproduction, in any manner, is prohibited.