Hi everyone, I've been doing work evaluating Solr for use on a hightraffic website for sometime and things are looking positive. I have some concerns from my higher-ups that I need to address. I have suggested that we use a single index in order to keep things simple, but there are suggestions to split are documents amongst different indexes.
The primary motivation for this split is a worry about potential index corruption. IE, if we only have one index and it becomes corrupt what do we do? I never considered this to be an issue since we would have backups etc., but I think they have had issues with other search technology in the past where one big index resulted in frequent and difficult to recover from corruption. Do you think this is a concern with Solr? If so, what would you suggest to mitigate the risk? My second question involves general deployment strategy. We will expect about 50 million documents, each on average a few paragraphs, and our website receives maybe 10 million hits a day. Can anyone provide an idea of # of servers, clustering/replication setup etc. that might be appropriate for this scenario? I'm interested to hear what other's experience is with similar situations. Thanks, -Kallin Nagelberg