On 9/29/2016 6:55 AM, Vasu Y wrote:
>  We would like to archive documents based on some criteria (like those that
> were not modified for more than an year OR are least used) in order to
> reduce storage requirements.
> I would like hear some of the best practices followed.
>
> How about having main collection and optionally an archive collection (or
> one or more archive collections?) to where we move documents (at regular
> intervals) from the main collection based on some criteria (least used or
> modified date etc.) and provide a flag during search whether to include
> archived documents in search or not?

As long as the collections are using compatible schemas and configs, the
general idea here should work.

If this is SolrCloud, you can create a collection alias that can search
multiple collections.

If it's not SolrCloud, you can still do a distributed search using the
"shards" parameter, but it will be slightly more complicated to set up.

If both schemas have a boolean field for the archive flag, with
documents in the main collection having "false" in that field and
documents in the archive collection having "true" in that field, then
you can include a filter for that flag in your search to limit the
search to one collection or the other.  I think that's probably the best
approach.

Thanks,
Shawn

Reply via email to