On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote: > On 9/2/2010 2:54 AM, Toke Eskildsen wrote: > > We've done a fair amount of experimentation in this area (1997-era SSDs > > vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in > > RAID 0). The harddisk setups never stood a chance for searching. With > > current SSD's being faster than harddisks for writes too, they'll also > > be better for index building, although not as impressive as for > > searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware > > How does it compare to six SATA drives in a Dell hardware RAID10?
I'll have to extrapolate a lot here (also known as guessing). You don't mention what kind of harddrives you're using, so let's say 15.000 RPM to err on the high-end side. Compared to the 2 drives @ 15.000 RPM in RAID 1 we've experimented with, the difference is that the striping allows for concurrency when the different reads are on different physical drives (sorry if this is basic, I'm just trying to establish a common understanding here). The chance for 2 concurrent reads to be on different drives with 3 harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the sake of argument, let's say that the 3 * striping gives us double the concurrency I/O. Taking my old measurements at face value and doubling the numbers for the 15.000 RPM measurements, this would bring six 15.000 RPM SATA 10 drives up to a throughput that is 1/3 - 2/3 of the SSD, depending on how we measure. Some general observations: With long runtimes, the throughput for harddisk rises relative to the SSD as the disk cache gets warmed. If there is frequent index updates with deletions, the SSD gains more ground as it is not nearly as dependent on disk cache as harddisks. With small indexes, the difference between harddisks and SSD is relatively small as the disk cache quickly gets filled. Consequently the difference increases for large indexes. One point to note for RAID is that they do not improve the speed of single searches on a single index: They do not lower the seek time for a single small I/O request and searching on a single index is done with a number of small successive requests. If the performance problem is long search time, RAID does not help (but in combination with sharding or similar it will). If the problem is the number of concurrent searches, RAID helps.