Christian,

You don't mention SolrCloud explicitly and based on what you wrote I'm assuming 
you are thinking/planning on using the Solr 3.* setup for this.  I think that's 
the first thing to change - this is going to be a pain to manage if you use 
Solr 3.*.  You should immediately start looking at using SolrCloud for this.  
Once you have a look you will see how a number of your questions will quickly 
become non-questions. :)

Otis 
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




>________________________________
> From: Christian von Wendt-Jensen <christian.sonne.jen...@infopaq.com>
>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> 
>Sent: Wednesday, May 23, 2012 6:59 AM
>Subject: Planning of future Solr setup
> 
>Hi,
>
>I'm in the middle of planning a new Solr setup. The situation is this:
>- We currently have one document type with around 20 fields, indexed, not 
>stored, except for a few date fields
>- We currently have indexed 400M documents across 20+ shards.
>- The number of documents to be indexed is around 1M/day, and this number is 
>increasing.
>- The index files totals to around 750GB
>- Users will mostly search newly indexed documents (news), and therefore the 
>shards represents dateranges.
>- Each month or so, we add a new shard.
>
>
>In my planning, my goals are:
>- it should be very easy to add a new shard and bring it online. Maybe it 
>could even be fully automated.
>- it should be very easy to retire a (old) shard in order to reclaim the 
>hardware resources for newer documents.
>- It should be very easy to scale wide or high by adding more machines or more 
>CPU/RAM. The resources should be able to autobalance the shards for optimum 
>resources usage.
>- Rebalancing should be very fast.
>- The setup should support one writer and many readers of the same physical 
>index. This avoids replication and moving large files around. This again 
>supports fast rebalancing of hardware resources.
>- Clients should be notified about shards coming online or going offline.
>
>The goals require a kind of distributed configuration and notifcation system. 
>Here I imagine Zookeeper comes into play.
>In order to make rebalancing very fast, the index should stay where they are, 
>and not be moved around. Instead Solr instances on available resources should 
>be configured to point to relevant shards. This requires a SAN storage, I 
>imagine.
>
>
>Questions:
>1. What is best practice in regard to using a machines resources: one tomcat 
>instance per one shard until memory and CPU is used up? Or rather one 
>tomcat/multiple cores, and the tomcat gets all memory available on the machine?
>2. Would it be a good idea to mix master and slave cores in the same tomcat 
>instance or should a machine be dedicated to either master cores or slave 
>cores?
>3. What would be the best way to notify the slave cores about recent commits 
>by the masters, remembering that replication is disabled?
>4. In the one writer, many readers scenario, what happens when the writer 
>merges/updates segments? Will the index files be physically deleted/altered? 
>And how will the slaves react to that?
>5. Would it be advisable to use a SAN for sharing index files between readers 
>and writers (one writer)? Any best practices on this area? I imagine one large 
>share on the SAN that all "resources" can mount.
>
>
>
>
>
>
>Med venlig hilsen / Best Regards
>
>Christian von Wendt-Jensen
>
>
>
>

Reply via email to