What about managing a core for each day? This way the deletion/archive is very simple. No "holes" in the index (which is often when deleting document by document). The index done against core [today-0]. The query is done against cores [today-0],[today-1]...[today-99]. Quite a headache.
Itamar -----Original Message----- From: Robert Stewart [mailto:bstewart...@gmail.com] Sent: יום ה 15 דצמבר 2011 16:54 To: solr-user@lucene.apache.org Subject: how to setup to archive expired documents? We have a large (100M) index where we add about 1M new docs per day. We want to keep index at a constant size so the oldest ones are removed and/or archived each day (so index contains around 100 days of data). What is the best way to do this? We still want to keep older data in some archive index, not just delete it (so is it possible to export older segments, etc. into some other index?). If we have some daily job to delete old data, I assume we'd need to optimize the index to actually remove and free space, but that will require very large (and slow) replication after optimize which will probably not work out well for so large an index. Is there some way to shard the data or other best practice? Thanks Bob This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.