I think managing 100 cores will be too much headache.  Also
performance of querying 100 cores will not be good (need
page_number*page_size from 100 cores, and then merge).

I think having around 10 SOLR instances, each one about 10M docs.
Always search all 10 nodes.  Index using some hash(doc) to distribute
new docs among nodes.  Run some nightly/weekly job to delete old docs
and force merge (optimize) to some min/max number of segments.  I
think that will work ok, but not sure about how to handle
replication/failover so each node is redundant.  If we use SOLR
replication it will have problems with replication after optimize for
large indexes.  Seems to take a long time to move 10M doc index from
master to slave (around 100GB in our case).  Doing it once per week is
probably ok.



2011/12/15 Avni, Itamar <itamar.a...@verint.com>:
> What about managing a core for each day?
>
> This way the deletion/archive is very simple. No "holes" in the index (which 
> is often when deleting document by document).
> The index done against core [today-0].
> The query is done against cores [today-0],[today-1]...[today-99]. Quite a 
> headache.
>
> Itamar
>
> -----Original Message-----
> From: Robert Stewart [mailto:bstewart...@gmail.com]
> Sent: יום ה 15 דצמבר 2011 16:54
> To: solr-user@lucene.apache.org
> Subject: how to setup to archive expired documents?
>
> We have a large (100M) index where we add about 1M new docs per day.
> We want to keep index at a constant size so the oldest ones are removed 
> and/or archived each day (so index contains around 100 days of data).  What 
> is the best way to do this?  We still want to keep older data in some archive 
> index, not just delete it (so is it possible to export older segments, etc. 
> into some other index?).  If we have some daily job to delete old data, I 
> assume we'd need to optimize the index to actually remove and free space, but 
> that will require very large (and slow) replication after optimize which will 
> probably not work out well for so large an index.  Is there some way to shard 
> the data or other best practice?
>
> Thanks
> Bob
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries.
> The information is intended to be for the use of the individual(s) or
> entity(ies) named above.  If you are not the intended recipient (or 
> authorized to receive this e-mail for the intended recipient), you may not 
> use, copy, disclose or distribute to anyone this message or any information 
> contained in this message.  If you have received this electronic message in 
> error, please notify us by replying to this e-mail.
>

Reply via email to