On 4/24/2018 6:30 AM, Chris Ulicny wrote:
I haven't worked with AWS, but recently we tried to move some of our solr instances to a cloud in Google's Cloud offering, and it did not go well. All of our problems ended up stemming from the fact that the I/O is throttled. Any complicated enough query would require too many disk reads to return the results in a reasonable time when being throttled. SSDs were better but not a practical cost and not as performant as our own bare metal.
If there's enough memory installed beyond what is required for the Solr heap, then Solr will rarely need to actually read the disk to satisfy a query. That is the secret to stellar performance. If switching to faster disks made a big difference in query performance, adding memory would yield an even greater improvement.
https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
When we were doing the initial indexing, the indexing processes would get to a point where the updates were taking minutes to complete and the cause was throttled write ops.
Indexing speed is indeed affected by disk speed, and adding memory can't fix that particular problem. Using a storage controller with a large amount of battery-backed cache memory can improve it.
-- set the max threads and max concurrent merges of the mergeScheduler to be 1 (or very low). This prevented excessive IO during indexing.
The max threads should be at 1 in the merge scheduler, but the max merges should actually be *increased*. I use a value of 6 for that. With SSD disks, the max threads can be increased, but I wouldn't push it very high.
Thanks, Shawn