Hi All, I am working on a Solr search based project, and would highly appreciate help/suggestions from you all regarding Solr architecture and capacity planning. Details of the project are as follows
1. There are 2 databases from which, data needs to be indexed and made searchable, - Production - Archive 2. Production database will retain 6 months old data and archive data every month. 3. Archive database will retain 3 years old data. 4. Database is SQL Server 2008 and Solr version is 3.1 Data to be indexed contains a huge volume of attachments (PDF, Word, excel etc..), approximately 200 GB per month. We are planning to do a full index every month (multithreaded) and incremental indexing on a daily basis. The Solr index size is coming to approximately 25 GB per month. If we were to use distributed search, what would be the best configuration for Production as well as Archive indexes ? What would be the best CPU/RAM/Disk configuration ? How can I implement failover mechanism for sharded searches ? Please let me know in case I need to share more information. -- Thanks and Regards Rahul A. Warawdekar