Hi Worty,

On Sun, Oct 21, 2012 at 2:30 AM, Worthy LaFollette <wort...@gmail.com> wrote:
> CAVEAT: I am a nubie w/r to SOLR (some Lucene experience, but not SOLR
> itself.  Trying to come up to speed.
>
>
> What have you all done w/r to SOLR capacity planning and disaster relief?

Re capacity planning - performance testing with realistic datasets,
query types and rates combined with monitoring tools that show you
system and Solr metrics so you can understand what is going on will
get you far.  Ongoing monitoring and observation of a running system
will let you understand trends, bottlenecks, and figure out if you
need to get ready to buy more RAM or add servers or ...

> I am curious to the following metrics:
>
>  - File handles and other ulimit/profile concerns

Not often a concern any more.  Typical Linux systems come with 1024
max open files, which is often insufficient, so people change that to
20K, 30K, etc.
I *think* we have this system metric in SPM for Solr, but I'm not sure
right now.

>  - Space calculations (particularly w/r to optimizations, etc.)

Monitoring again is the best way to tell and to keep an eye on this.
Optimization can take ~3x disk space, if I remember correctly.  You
can also check ML archives for recent emails re index optimization.

>  - Taxonomy considerations

I think this is typically DIY.

>  - Single Core vs. Multi-core

Not sure what to say here.  Typically one type of data goes in one
core.  You typically don't put both people records and product records
and order records in the same core because these three things have
different structure/schema.

>  - ?
>
> Also, anyone plan for Disaster relief for SOLR across non-metro data
> centers?   Currently not an issue for me, but will be shortly.

Have a look at http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

Reply via email to