Hi Worty, On Sun, Oct 21, 2012 at 2:30 AM, Worthy LaFollette <wort...@gmail.com> wrote: > CAVEAT: I am a nubie w/r to SOLR (some Lucene experience, but not SOLR > itself. Trying to come up to speed. > > > What have you all done w/r to SOLR capacity planning and disaster relief?
Re capacity planning - performance testing with realistic datasets, query types and rates combined with monitoring tools that show you system and Solr metrics so you can understand what is going on will get you far. Ongoing monitoring and observation of a running system will let you understand trends, bottlenecks, and figure out if you need to get ready to buy more RAM or add servers or ... > I am curious to the following metrics: > > - File handles and other ulimit/profile concerns Not often a concern any more. Typical Linux systems come with 1024 max open files, which is often insufficient, so people change that to 20K, 30K, etc. I *think* we have this system metric in SPM for Solr, but I'm not sure right now. > - Space calculations (particularly w/r to optimizations, etc.) Monitoring again is the best way to tell and to keep an eye on this. Optimization can take ~3x disk space, if I remember correctly. You can also check ML archives for recent emails re index optimization. > - Taxonomy considerations I think this is typically DIY. > - Single Core vs. Multi-core Not sure what to say here. Typically one type of data goes in one core. You typically don't put both people records and product records and order records in the same core because these three things have different structure/schema. > - ? > > Also, anyone plan for Disaster relief for SOLR across non-metro data > centers? Currently not an issue for me, but will be shortly. Have a look at http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html