Hi, We've done a fair number of such things over the years. :) If daily shards don't work for you, why not weekly or monthly? Have a look at Zoie's Hourglass concept/code. Some Solr alternatives are currently better suited to handle this sort of setup...
Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html ----- Original Message ----- > From: Robert Stewart <bstewart...@gmail.com> > To: solr-user@lucene.apache.org > Cc: > Sent: Thursday, December 15, 2011 12:55 PM > Subject: Re: how to setup to archive expired documents? > > I think managing 100 cores will be too much headache. Also > performance of querying 100 cores will not be good (need > page_number*page_size from 100 cores, and then merge). > > I think having around 10 SOLR instances, each one about 10M docs. > Always search all 10 nodes. Index using some hash(doc) to distribute > new docs among nodes. Run some nightly/weekly job to delete old docs > and force merge (optimize) to some min/max number of segments. I > think that will work ok, but not sure about how to handle > replication/failover so each node is redundant. If we use SOLR > replication it will have problems with replication after optimize for > large indexes. Seems to take a long time to move 10M doc index from > master to slave (around 100GB in our case). Doing it once per week is > probably ok. > > > > 2011/12/15 Avni, Itamar <itamar.a...@verint.com>: >> What about managing a core for each day? >> >> This way the deletion/archive is very simple. No "holes" in the > index (which is often when deleting document by document). >> The index done against core [today-0]. >> The query is done against cores [today-0],[today-1]...[today-99]. Quite a > headache. >> >> Itamar >> >> -----Original Message----- >> From: Robert Stewart [mailto:bstewart...@gmail.com] >> Sent: יום ה 15 דצמבר 2011 16:54 >> To: solr-user@lucene.apache.org >> Subject: how to setup to archive expired documents? >> >> We have a large (100M) index where we add about 1M new docs per day. >> We want to keep index at a constant size so the oldest ones are removed > and/or archived each day (so index contains around 100 days of data). What > is > the best way to do this? We still want to keep older data in some archive > index, not just delete it (so is it possible to export older segments, etc. > into > some other index?). If we have some daily job to delete old data, I assume > we'd need to optimize the index to actually remove and free space, but that > will require very large (and slow) replication after optimize which will > probably not work out well for so large an index. Is there some way to shard > the data or other best practice? >> >> Thanks >> Bob >> This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. >> The information is intended to be for the use of the individual(s) or >> entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, > copy, disclose or distribute to anyone this message or any information > contained > in this message. If you have received this electronic message in error, > please > notify us by replying to this e-mail. >> >