On Thu, 2013-12-12 at 11:10 +0100, Hoggarth, Gil wrote:
> Thanks for this - I haven't any previous experience with utilising SSDs
> in the way you suggest, so I guess I need to start learning!

There's a bit of divide in the Lucene/Solr-world on this. Everybody
agrees that SSDs in themselves are great for Lucene/Solr searches,
compared to a spinning drives solution. How much better is another
matter and the issue gets confusing when RAM caching is factored in.

Some are also very concerned about the reliability of SSDs and the write
performance degradation without TRIM (you need to have a quite specific
setup to have TRIM enabled on a server with SSDs in RAID). Guessing that
your 6TB index is not heavily updated, the TRIM part should not be one
of your worries though.

At Statsbiblioteket, we have been using SSDs for our search servers
since 2008. That was back when random write performance was horrible and
a large drive was 64GB. As you have probably guessed, we are very much
in the SSD camp.

We have done some testing and for simple searches (i.e. a lot of IO and
comparatively little CPU usage), we have observed that SSDs + 10% index
size RAM for caching deliver something like 80% of pure RAM speed.
https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

Your mileage will surely vary.

> [...] leaving 126GB on each server for the OS and MMap. [...]

So about the same as your existing 3TB setup? Seems like you will get
the same performance then. I must say that 1 minute response times would
be very hard to sell at our library, even for a special search only used
by a small and dedicated audience. Even your goal of 20 seconds seems
adverse to exploratory search.

May I be so frank as to suggest a course of action? Buy one ½ TB Samsung
840 EVO SSD, fill it with indexes and test it in a machine with 32GB of
RAM, thus matching the 1/20 index size RAM that your servers will have.
Such a drive costs £250 on Amazon and the experiment would spare you for
a lot of speculation and time.

Next, conclude that SSDs are the obvious choice and secure the 840 for
your workstation with reference to "further testing".

> I can also see that our hardware requirements will also depend on usage
> as well as the volume of data, and I've been pondering how best we can
> structure our index/es to facilitate a long term service (which means
> that, given it's a lot of data, I need to structure the data so that
> new usage doesn't require re-indexing.)

We definitely have this problem too. We have resigned to re-indexing the
data after some months of real world usage.

Regards,
Toke Eskildsen, State and University Library, Denmark

Reply via email to