I believe you can just define multiple cores: <core default="true" instanceDir="shard1/" name="collectionName_shard1" shard="shard1"/> <core default="true" instanceDir="shard2/" name="collectionName_shard2" shard="shard2"/> ...
(this is the old style solr.xml. I don't know how to do it in the newer style) Also, make sure you don't define a non-relative <dataDir> in solrconfig.xml, or you may run into issues with cores trying to use the same data dir. -Greg On Mon, Dec 23, 2013 at 2:16 PM, David Santamauro <david.santama...@gmail.com> wrote: > On 12/23/2013 05:03 PM, Greg Preston wrote: >>> >>> Yes, I'm well aware of the performance implications, many of which are >>> mitigated by 2TB of SSD and 512GB RAM >> >> >> I've got a very similar setup in production. 2TB SSD, 256G RAM (128G >> heaps), and 1 - 1.5 TB of index per node. We're in the process of >> splitting that to multiple JVMs per host. GC pauses were causing ZK >> timeouts (you should up that in solr.xml). And resync's after the >> timeouts took long enough that a large tlog built up (we have near >> continuous indexing), and we couldn't replay the tlog fast enough to >> catch up to current. > > > GC pauses are a huge issue in our current production environment (monolithic > index) and general performance was meager, hence the move to a distributed > design. We will have 8 nodes with ~ 200GB per node, one shard each and > performance for single and most multi-term queries has become sub-second and > throughput has increased 10-fold. Larger boolean queries can still take 2-3s > but we can live with that. > > At any rate, I still can't figure out what my solr.xml is supposed to look > like on the node with all 8 redundant shards. > > David > > > >> On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro >> <david.santama...@gmail.com> wrote: >>> >>> On 12/22/2013 09:48 PM, Shawn Heisey wrote: >>>> >>>> >>>> On 12/22/2013 2:10 PM, David Santamauro wrote: >>>>> >>>>> >>>>> My goal is to have a redundant copy of all 8 currently running, but >>>>> non-redundant shards. This setup (8 nodes with no replicas) was a test >>>>> and it has proven quite functional from a performance perspective. >>>>> Loading, though, takes almost 3 weeks so I'm really not in a position >>>>> to >>>>> redesign the distribution, though I can add nodes. >>>>> >>>>> I have acquired another resource, a very large machine that I'd like to >>>>> use to hold the replicas of the currently deployed 8-nodes. >>>>> >>>>> I realize I can run 8 jetty/tomcats and accomplish my goal but that is >>>>> a >>>>> maintenance headache and is really a last resort. I really would just >>>>> like to be able to deploy this big machine with 'numShards=8'. >>>>> >>>>> Is that possible or do I really need to have 8 other nodes running? >>>> >>>> >>>> >>>> You don't want to run more than one container or Solr instance per >>>> machine. Things can get very confused, and it's too much overhead. >>> >>> >>>> >>>> >>>> With existing collections, you can simply run the CoreAdmin CREATE >>>> >>>> action on the new node with more resources. >>>> >>>> So you'd do something like this, once for each of the 8 existing parts: >>>> >>>> >>>> >>>> http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1 >>>> >>>> It will automatically replicate the shard from its current leader. >>> >>> >>> >>> Fantastic! Clearly my understanding of "collection", vs "core" vs "shard" >>> was lacking but now I see the relationship better. >>> >>> >>>> >>>> One thing to be aware of: With 1.4TB of index data, it might be >>>> impossible to keep enough of the index in RAM for good performance, >>>> unless the machine has a terabyte or more of RAM. >>> >>> >>> >>> Yes, I'm well aware of the performance implications, many of which are >>> mitigated by 2TB of SSD and 512GB RAM. >>> >>> Thanks for the nudge in the right direction. The first node/shard1 is >>> replicating right now. >>> >>> David >>> >>> >>> >