Thanks !

My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)
2. Master configuration
 will be 4 CPU


On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hi Rahul,
>
> This is unfortunately not enough information for anyone to give you very
> precise answers, so I'll just give some rough ones:
>
> * best disk - SSD :)
> * CPU - multicore, depends on query complexity, concurrency, etc.
> * sharded search and failover - start with SolrCloud, there are a couple
> of pages about it on the Wiki and
> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >________________________________
> >From: Rahul Warawdekar <rahul.warawde...@gmail.com>
> >To: solr-user <solr-user@lucene.apache.org>
> >Sent: Tuesday, October 11, 2011 11:47 AM
> >Subject: Architecture and Capacity planning for large Solr index
> >
> >Hi All,
> >
> >I am working on a Solr search based project, and would highly appreciate
> >help/suggestions from you all regarding Solr architecture and capacity
> >planning.
> >Details of the project are as follows
> >
> >1. There are 2 databases from which, data needs to be indexed and made
> >searchable,
> >                - Production
> >                - Archive
> >2. Production database will retain 6 months old data and archive data
> every
> >month.
> >3. Archive database will retain 3 years old data.
> >4. Database is SQL Server 2008 and Solr version is 3.1
> >
> >Data to be indexed contains a huge volume of attachments (PDF, Word, excel
> >etc..), approximately 200 GB per month.
> >We are planning to do a full index every month (multithreaded) and
> >incremental indexing on a daily basis.
> >The Solr index size is coming to approximately 25 GB per month.
> >
> >If we were to use distributed search, what would be the best configuration
> >for Production as well as Archive indexes ?
> >What would be the best CPU/RAM/Disk configuration ?
> >How can I implement failover mechanism for sharded searches ?
> >
> >Please let me know in case I need to share more information.
> >
> >
> >--
> >Thanks and Regards
> >Rahul A. Warawdekar
> >
> >
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar

Reply via email to