Hi,

We've done a fair number of such things over the years. :)
If daily shards don't work for you, why not weekly or monthly?
Have a look at Zoie's Hourglass concept/code.
Some Solr alternatives are currently better suited to handle this sort of 
setup...

Otis 
----
Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



----- Original Message -----
> From: Robert Stewart <bstewart...@gmail.com>
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, December 15, 2011 12:55 PM
> Subject: Re: how to setup to archive expired documents?
> 
> I think managing 100 cores will be too much headache.  Also
> performance of querying 100 cores will not be good (need
> page_number*page_size from 100 cores, and then merge).
> 
> I think having around 10 SOLR instances, each one about 10M docs.
> Always search all 10 nodes.  Index using some hash(doc) to distribute
> new docs among nodes.  Run some nightly/weekly job to delete old docs
> and force merge (optimize) to some min/max number of segments.  I
> think that will work ok, but not sure about how to handle
> replication/failover so each node is redundant.  If we use SOLR
> replication it will have problems with replication after optimize for
> large indexes.  Seems to take a long time to move 10M doc index from
> master to slave (around 100GB in our case).  Doing it once per week is
> probably ok.
> 
> 
> 
> 2011/12/15 Avni, Itamar <itamar.a...@verint.com>:
>>  What about managing a core for each day?
>> 
>>  This way the deletion/archive is very simple. No "holes" in the 
> index (which is often when deleting document by document).
>>  The index done against core [today-0].
>>  The query is done against cores [today-0],[today-1]...[today-99]. Quite a 
> headache.
>> 
>>  Itamar
>> 
>>  -----Original Message-----
>>  From: Robert Stewart [mailto:bstewart...@gmail.com]
>>  Sent: יום ה 15 דצמבר 2011 16:54
>>  To: solr-user@lucene.apache.org
>>  Subject: how to setup to archive expired documents?
>> 
>>  We have a large (100M) index where we add about 1M new docs per day.
>>  We want to keep index at a constant size so the oldest ones are removed 
> and/or archived each day (so index contains around 100 days of data).  What 
> is 
> the best way to do this?  We still want to keep older data in some archive 
> index, not just delete it (so is it possible to export older segments, etc. 
> into 
> some other index?).  If we have some daily job to delete old data, I assume 
> we'd need to optimize the index to actually remove and free space, but that 
> will require very large (and slow) replication after optimize which will 
> probably not work out well for so large an index.  Is there some way to shard 
> the data or other best practice?
>> 
>>  Thanks
>>  Bob
>>  This electronic message may contain proprietary and confidential 
> information of Verint Systems Inc., its affiliates and/or subsidiaries.
>>  The information is intended to be for the use of the individual(s) or
>>  entity(ies) named above.  If you are not the intended recipient (or 
> authorized to receive this e-mail for the intended recipient), you may not 
> use, 
> copy, disclose or distribute to anyone this message or any information 
> contained 
> in this message.  If you have received this electronic message in error, 
> please 
> notify us by replying to this e-mail.
>> 
>

Reply via email to