Thanks ! My business requirements have changed a bit. We need one year rolling data in Production. The index size for the same comes to approximately 200 - 220 GB. I am planning to address this using Solr distributed search as follows.
1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration will be 4 CPU On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hi Rahul, > > This is unfortunately not enough information for anyone to give you very > precise answers, so I'll just give some rough ones: > > * best disk - SSD :) > * CPU - multicore, depends on query complexity, concurrency, etc. > * sharded search and failover - start with SolrCloud, there are a couple > of pages about it on the Wiki and > http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/ > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > >________________________________ > >From: Rahul Warawdekar <rahul.warawde...@gmail.com> > >To: solr-user <solr-user@lucene.apache.org> > >Sent: Tuesday, October 11, 2011 11:47 AM > >Subject: Architecture and Capacity planning for large Solr index > > > >Hi All, > > > >I am working on a Solr search based project, and would highly appreciate > >help/suggestions from you all regarding Solr architecture and capacity > >planning. > >Details of the project are as follows > > > >1. There are 2 databases from which, data needs to be indexed and made > >searchable, > > - Production > > - Archive > >2. Production database will retain 6 months old data and archive data > every > >month. > >3. Archive database will retain 3 years old data. > >4. Database is SQL Server 2008 and Solr version is 3.1 > > > >Data to be indexed contains a huge volume of attachments (PDF, Word, excel > >etc..), approximately 200 GB per month. > >We are planning to do a full index every month (multithreaded) and > >incremental indexing on a daily basis. > >The Solr index size is coming to approximately 25 GB per month. > > > >If we were to use distributed search, what would be the best configuration > >for Production as well as Archive indexes ? > >What would be the best CPU/RAM/Disk configuration ? > >How can I implement failover mechanism for sharded searches ? > > > >Please let me know in case I need to share more information. > > > > > >-- > >Thanks and Regards > >Rahul A. Warawdekar > > > > > > > -- Thanks and Regards Rahul A. Warawdekar