Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: > We've installed a cluster of one collection of 350M documents on 3 > r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is > about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS > General purpose (1x1TB + 1x500GB) on each instance. Then we create logical > volume using LVM of 1.5TB to fit our index.
Your search speed will be limited by the slowest storage in your group, which would be your 500GB EBS. The General Purpose SSD option means (as far as I can read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say anything about latency. For comparison, I checked the system logs from a local test with our 21TB / 7 billion documents index. It used ~27,000 IOPS during the test, with mean search time a bit below 1 second. That was with ~100GB RAM for disk cache, which is about ½% of index size. The test was with simple term queries (1-3 terms) and some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS. All else being equal (which is never the case), getting 1-3 second response times for a 1.1TB index, when one link in the storage chain is capped at a few thousand IOPS, you are using networked storage and you have little RAM for caching, does not seem unrealistic. If possible, you could try temporarily boosting performance of the EBS, to see if raw IO is the bottleneck. > The response time is about 1 and 3 seconds for simple queries (1 token). Is the index updated while you are searching? Do you do any faceting or other heavy processing as part of a search? How many hits does a search typically have and how many documents are returned? How many concurrent searches do you need to support? How fast should the response time be? - Toke Eskildsen