On 12/23/2013 05:03 PM, Greg Preston wrote:
Yes, I'm well aware of the performance implications, many of which are 
mitigated by 2TB of SSD and 512GB RAM

I've got a very similar setup in production.  2TB SSD, 256G RAM (128G
heaps), and 1 - 1.5 TB of index per node.  We're in the process of
splitting that to multiple JVMs per host.  GC pauses were causing ZK
timeouts (you should up that in solr.xml).  And resync's after the
timeouts took long enough that a large tlog built up (we have near
continuous indexing), and we couldn't replay the tlog fast enough to
catch up to current.

GC pauses are a huge issue in our current production environment (monolithic index) and general performance was meager, hence the move to a distributed design. We will have 8 nodes with ~ 200GB per node, one shard each and performance for single and most multi-term queries has become sub-second and throughput has increased 10-fold. Larger boolean queries can still take 2-3s but we can live with that.

At any rate, I still can't figure out what my solr.xml is supposed to look like on the node with all 8 redundant shards.

David


On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro
<david.santama...@gmail.com> wrote:
On 12/22/2013 09:48 PM, Shawn Heisey wrote:

On 12/22/2013 2:10 PM, David Santamauro wrote:

My goal is to have a redundant copy of all 8 currently running, but
non-redundant shards. This setup (8 nodes with no replicas) was a test
and it has proven quite functional from a performance perspective.
Loading, though, takes almost 3 weeks so I'm really not in a position to
redesign the distribution, though I can add nodes.

I have acquired another resource, a very large machine that I'd like to
use to hold the replicas of the currently deployed 8-nodes.

I realize I can run 8 jetty/tomcats and accomplish my goal but that is a
maintenance headache and is really a last resort. I really would just
like to be able to deploy this big machine with 'numShards=8'.

Is that possible or do I really need to have 8 other nodes running?


You don't want to run more than one container or Solr instance per
machine.  Things can get very confused, and it's too much overhead.



With existing collections, you can simply run the CoreAdmin CREATE

action on the new node with more resources.

So you'd do something like this, once for each of the 8 existing parts:


http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1

It will automatically replicate the shard from its current leader.


Fantastic! Clearly my understanding of "collection", vs "core" vs "shard"
was lacking but now I see the relationship better.



One thing to be aware of: With 1.4TB of index data, it might be
impossible to keep enough of the index in RAM for good performance,
unless the machine has a terabyte or more of RAM.


Yes, I'm well aware of the performance implications, many of which are
mitigated by 2TB of SSD and 512GB RAM.

Thanks for the nudge in the right direction. The first node/shard1 is
replicating right now.

David




Reply via email to