On Thu, 2013-12-12 at 11:10 +0100, Hoggarth, Gil wrote: > Thanks for this - I haven't any previous experience with utilising SSDs > in the way you suggest, so I guess I need to start learning!
There's a bit of divide in the Lucene/Solr-world on this. Everybody agrees that SSDs in themselves are great for Lucene/Solr searches, compared to a spinning drives solution. How much better is another matter and the issue gets confusing when RAM caching is factored in. Some are also very concerned about the reliability of SSDs and the write performance degradation without TRIM (you need to have a quite specific setup to have TRIM enabled on a server with SSDs in RAID). Guessing that your 6TB index is not heavily updated, the TRIM part should not be one of your worries though. At Statsbiblioteket, we have been using SSDs for our search servers since 2008. That was back when random write performance was horrible and a large drive was 64GB. As you have probably guessed, we are very much in the SSD camp. We have done some testing and for simple searches (i.e. a lot of IO and comparatively little CPU usage), we have observed that SSDs + 10% index size RAM for caching deliver something like 80% of pure RAM speed. https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ Your mileage will surely vary. > [...] leaving 126GB on each server for the OS and MMap. [...] So about the same as your existing 3TB setup? Seems like you will get the same performance then. I must say that 1 minute response times would be very hard to sell at our library, even for a special search only used by a small and dedicated audience. Even your goal of 20 seconds seems adverse to exploratory search. May I be so frank as to suggest a course of action? Buy one ½ TB Samsung 840 EVO SSD, fill it with indexes and test it in a machine with 32GB of RAM, thus matching the 1/20 index size RAM that your servers will have. Such a drive costs £250 on Amazon and the experiment would spare you for a lot of speculation and time. Next, conclude that SSDs are the obvious choice and secure the 840 for your workstation with reference to "further testing". > I can also see that our hardware requirements will also depend on usage > as well as the volume of data, and I've been pondering how best we can > structure our index/es to facilitate a long term service (which means > that, given it's a lot of data, I need to structure the data so that > new usage doesn't require re-indexing.) We definitely have this problem too. We have resigned to re-indexing the data after some months of real world usage. Regards, Toke Eskildsen, State and University Library, Denmark