You can also look at sharding options for SolrCloud, e.g. with implicit sharding you can choose a sharding field and SolrCloud will index your docs into shards based on this field. You could have two shards (and also replicate your main shard if you want for distributed searches and fault tolerance) or even split your main and archive into several shards depending on size/general requirements. You can then very easily search your main shard(s) by adding shard=my_main_shard or the entire collection by excluding it. I'm looking at this for time-series data where I'll have maybe a shard per year so my shard field would be the year, it may make sense to do some "manual" work to merge older shards but not sure on this yet. Alternatively you can use a composite key to be more explicit about whether you place your docs in archive or not by using the prefix of the key to denote main/archive, and you'd have the same options for searching as above. With this you'd need to do some re-indexing as you move stuff in and out of archive - sounds like you'd need something like this because you want to be more in control of whether a doc is in archive or not.
-----Original Message----- From: Vasu Y [mailto:vya...@gmail.com] Sent: 29 September 2016 14:55 To: solr-user@lucene.apache.org Subject: Archiving documents Hi, We would like to archive documents based on some criteria (like those that were not modified for more than an year OR are least used) in order to reduce storage requirements. I would like hear some of the best practices followed. How about having main collection and optionally an archive collection (or one or more archive collections?) to where we move documents (at regular intervals) from the main collection based on some criteria (least used or modified date etc.) and provide a flag during search whether to include archived documents in search or not? Thanks, Vasu