Re: Solr Sharding Strategy

Bhaumik Joshi Fri, 15 Apr 2016 01:24:58 -0700

Hi ,

Toke - I tried with pausing the indexing fully but got the slight improvement 
so the impact of indexing is not that much.

Shawn - Answer to your question - I am sending one document in one update 
request.

I have test solr cloud configured like 2 shards on one machine and each of has 
one replica on another machine. So in order to check the network latency is 
bottleneck or not i have disabled replicas and run the test but didn't get 
improvement.

Another thing i have tried in order to balance the load and providing more CPU 
and memory resources i have configured only 2 shards both are on separate 
machine and no replica and then run the test but in that case performance got 
down.

Talking about the production we want to have 2 shard in order to make platform 
scalable  and future proof. Just want inform that we have 22 collections on 
production in that 4 are major in terms of volume and complexity and which 
frequently used for querying and indexing and rest of them are comparatively 
minor and have less query and index hits. Below are the production index 
statistics.

No of collections: 22 collections having 139 million documents with index size 
of 85 GB.
Major collections: 4 collections having 134 million documents with index size 
of 77 GB.
Minor collections: 18 collections having 5 million documents with index size of 
8 GB.

So any idea on how to improve query performance with this statistics along with 
Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) 
scenario?

Thanks & Regards,
Bhaumik Joshi

________________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Tuesday, April 12, 2016 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

On 4/11/2016 6:31 AM, Bhaumik Joshi wrote:
> We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> sec) and Query-heavy (100 queries per sec) scenario.
>
> *Index stats: *10 million documents and 16 GB index size
>
>
>
> Which sharding strategy is best suited in above scenario?
>
> Please share reference resources which states detailed comparison of
> single shard over multi shard if any.
>
>
>
> Meanwhile we did some tests with SolrMeter (Standalone java tool for
> stress tests with Solr) for single shard and two shards.
>
> *Index stats of test solr cloud: *0.7 million documents and 1 GB index
> size.
>
> As observed in test average query time with 2 shards is much higher
> than single shard.
>

On the same hardware, multiple shards will usually be slower than one
shard, especially under a high load.  Sharding can give good results
with *more* hardware, providing more CPU and memory resources.  When the
query load is high, there should only be only one core (shard replica)
per server, and Solr works best when it is running on bare metal, not
virtualized.

Handling 100 queries per second will require multiple copies of your
index on separate hardware.  This is a fairly high query load.  There
are installations handling much higher loads, of course.  Those
installations have a LOT of replicas and some way to balance load across
them.

For 10 million documents and 16GB of index, I'm not sure that I would
shard at all, just make sure that each machine has plenty of memory --
probably somewhere in the neighborhood of 24GB to 32GB.  That assumes
that Solr is the only thing running on that server, and that if it's
virtualized, making sure that the physical server's memory is not
oversubscribed.

Regarding your specific numbers:

The low queries per second may be caused by one or more of these
problems, or perhaps something I haven't thought of:  1) your queries
are particularly heavy.  2) updates are interfering by tying up scarce
resources.  3) you don't have enough memory in the machine.

How many documents are in each update request that you are sending?  In
another thread on the list, you have stated that you have a 1 second
maxTime on autoSoftCommit.  This is *way* too low, and a *major* source
of performance issues.  Very few people actually need that level of
latency -- a maxTime measured in minutes may be fast enough, and is much
friendlier for performance.

Thanks,
Shawn

Re: Solr Sharding Strategy

Reply via email to