Re: SOLR replicas performance

Tomás Fernández Löbbe Fri, 08 Jan 2016 09:48:07 -0800

Hi Luca,
It looks like your queries are complex wildcard queries. My theory is that
you are CPU-bounded, for a single query one CPU core for each shard will be
at 100% for the duration of the sub-query. Smaller shards make these
sub-queries faster which is why 16 shards is better than 8 in your case.
* In your 16x1 configuration, you have exactly one shard per CPU core, so
in a single query, 16 subqueries will go to both nodes evenly and use one
of the CPU cores.
* In your 8x2 configuration, you still get to use one CPU core per shard,
but the shards are bigger, so maybe each subquery takes longer (for the
single query thread and 8x2 scenario I would expect CPU utilization to be
lower?).
* In your 16x2 case 16 subqueries will be distributed un-evenly, and some
node will get more than 8 subqueries, which means that some of the
subqueries will have to wait for their turn for a CPU core. In addition,
more Solr cores will be competing for resources.
If this theory is correct, adding more replicas won't speedup your queries,
you need to either get faster CPU or simplify your queries/configuration in
some way. Adding more replicas should improve your query throughput, but
only if you add them in more HW, not the same one.


...anyway, just a theory

Tomás

On Fri, Jan 8, 2016 at 7:40 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/8/2016 7:55 AM, Luca Quarello wrote:
> > I used solr5.3.1 and I sincerely expected response times with replica
> > configuration near to response times without replica configuration.
> >
> > Do you agree with me?
> >
> > I read here
> >
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html
> > that "Queries do not need to be routed to leaders; they can be handled by
> > any replica in a shard. Leaders are only needed for handling update
> > requests. "
> >
> > I haven't found this behaviour. In my case CONF2 e CONF3 have all
> replicas
> > on VM2 but analyzing core utilization during a request is 100% on both
> > machines. Why?
>
> Indexing is a little bit slower with replication -- the update must
> happen on all replicas.
>
> If your index is sharded (which I believe you did indicate in your
> initial message), you may find that all replicas get used even for
> queries.  It is entirely possible that some of the shard subqueries will
> be processed on one replica and some of them will be processed on other
> replicas.  I do not know if this commonly happens, but I would not be
> surprised if it does.  If the machines are sized appropriately for the
> index, this separation should speed up queries, because you have the
> resources of multiple machines handling one query.
>
> That phrase "sized appropriately" is very important.  Your initial
> message indicated that you have a 90GB index, and that you are running
> in virtual machines.  Typically VMs have fairly small memory sizes.  It
> is very possible that you simply don't have enough memory in the VM for
> good performance with an index that large.  With 90GB of index data on
> one machine, I would hope for at least 64GB of RAM, and I would prefer
> to have 128GB.  If there is more than 90GB of data on one machine, then
> even more memory would be needed.
>
> Thanks,
> Shawn
>
>

Re: SOLR replicas performance

Reply via email to