Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

1. There are 2 databases from which, data needs to be indexed and made
searchable,
                - Production
                - Archive
2. Production database will retain 6 months old data and archive data every
month.
3. Archive database will retain 3 years old data.
4. Database is SQL Server 2008 and Solr version is 3.1

Data to be indexed contains a huge volume of attachments (PDF, Word, excel
etc..), approximately 200 GB per month.
We are planning to do a full index every month (multithreaded) and
incremental indexing on a daily basis.
The Solr index size is coming to approximately 25 GB per month.

If we were to use distributed search, what would be the best configuration
for Production as well as Archive indexes ?
What would be the best CPU/RAM/Disk configuration ?
How can I implement failover mechanism for sharded searches ?

Please let me know in case I need to share more information.


-- 
Thanks and Regards
Rahul A. Warawdekar

Reply via email to