1. Yes, that's the right way to go, well, in theory at least :)
2. Yes, queries are alway fanned to all shards and will be as slow as the slowest shard. When I looked into
Solr distributed querying implementation a few months back, the support for graceful degradation for things
like network failures and slow shards was not there yet.
3. I doubt mmap settings would impact your read-only load, and it seems you can easily
fit your index in RAM. You could try to warm the file cache to make sure with "cat $sorl_dir > /dev/null". 

It's odd that only 2 nodes are at 100% in your set up. I would check a couple of things:
a. Are you docs distributed evenly across shards: number of docs and size of the shards
b. Is your test client querying all nodes, or all the queries go to those 2 busy nodes?

Regards,
Tri

On Feb 14, 2014, at 10:52 AM, Nitin Sharma <nitin.sha...@bloomreach.com> wrote:

Hell folks

We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster
with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
solrconfig used by our collections

We have many collections and some of them are relatively very large
compared to the other. The size of the shard of these big collections are
in the order of Gigabytes.We decided to split the bigger collection evenly
across all nodes (8 shards and 2 replicas) with maxNumShards > 1.

We did a test with a read load only on one big collection and we still see
only 2 nodes running 100% CPU and the rest are blazing through the queries
way faster (under 30% cpu). [Despite all of them being sharded across all
nodes]

I checked the JVM usage and found that none of the pools have high
utilization (except Survivor space which is 100%). The GC cycles are in
the order of ms and mostly doing scavenge. Mark and sweep occurs once every
30 minutes

Few questions:

1. Sharding all collections (small and large) across all nodes evenly
distributes the load and makes the system characteristics of all machines
similar. Is this a recommended way to do ?
2. Solr Cloud does a distributed query by default. So if a node is at
100% CPU does it slow down the response time for the other nodes waiting
for this query? (or does it have a timeout if it cannot get a response from
a node within x seconds?)
3. Our collections use Mmap directory but i specifically haven't enabled
anything related to mmaps (locked pages under ulimit ). Does it adverse
affect performance? or can lock pages even without this?

Thanks a lot in advance.
Nitin

Reply via email to