On 12/26/2013 3:38 AM, Jilal Oussama wrote:
> Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
> & 840 GB storage) and contained several cores for different usage.
> 
> When I manually executed a query through Solr Admin (a query containing
> 10~15 terms, with some of them having boosts over one field and limited to
> one result without any sorting or faceting etc ....) it takes around 700
> ms, and the Core contained 7 million documents.
> 
> When the scripts are executed things get slower, my query takes 7~10s.
> 
> Then what I did is to turn to SolrCloud expecting huge performance increase.
> 
> I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
> to contain the core I was querying, I sharded it to 25 shards (each node
> containing 5 shards without replication), each shards took 54 MB of storage.
> 
> Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> is very good !
> 
> Tested my scripts again (I have 30 scripts running at the same time), and
> as a surprise, things run fast for 5 seconds then it turns realy slow again
> (query time ).
> 
> I updated the solrconfig.xml to remove the query caches (I don't need them
> since queries are very different and only 1 time queries) and changes the
> index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

Your SolrCloud setup has 35 times as much CPU power (just basing this on
the ECU numbers) as your single-server setup, ten times as much memory,
and a lot more IOPS because you moved to SSD.  A 10X increase in single
query performance is not surprising.

You have not indicated how much memory is assigned to the java heap on
each server.  I think that there are three possible problems happening
here, with a strong possibility that the third one is happening at the
same time as one of the other two:

1) Full garbage collections are too frequent because the heap is too small.
2) Garbage collections take too long because the heap is very large and
GC is not tuned.
3) Extremely high disk I/O because the OS disk cache is too small for
the index size.

Some information on these that might be helpful:

http://wiki.apache.org/solr/SolrPerformanceProblems

The general solution for good Solr performance is to throw hardware,
especially memory, at the problem.  It's worth pointing out that any
level of hardware investment has an upper limit on the total query
volume it can support.  Running 30 test scripts at the same time will be
difficult for all but the most powerful and expensive hardware to deal
with, especially if every query is different.  A five-server cloud where
each server has 8 CPU cores and 15GB of memory is pretty small, all
things considered.

Thanks,
Shawn

Reply via email to