On Tue, 2010-02-16 at 10:35 +0100, Tim Terlegård wrote:
> I actually tried SSD yesterday. Queries which need to go to disk are
> much faster now. I did expect that warmup for sort fields would be
> much quicker as well, but that seems to be cpu bound.

That and bulk I/O. The sorter imports the Terms into RAM by iterating,
which means that the IO-access for this is sequential. Most modern SSDs
are faster than conventional harddisks for this, but not by much.

> It still takes a minute to cache the six sort fields of the 40 million 
> document index.

I am not aware of any solutions to this, besides beefing hardware bulk
reads and processor speed (the sorter is not threaded as far as I
remember). It it technically possible to move this step to the indexer,
but the only win would be for setups with few builders and many
searchers.

> Are there any differences among SSD disks. Why is Intel X25-M your favourite?

A soft reason is that I have faith in support from Intel: There has been
problems with earlier versions of the drive (nuking content in some
edge-cases and performance degradation (which hits all SSDs)) and Intel
has responded well by acknowledging the problems and resolving them.
That's very subjective though and I'm sure that some would turn that
around and say that Intel delivered crap in the first place.

On the harder side, the Intel drive is surprisingly cheap and provides
random IO performance ahead of most competitors. Especially for random
writes, which is normally the weak point for SSDs. Some graphs can be
found at Anandtech: 
http://anandtech.com/storage/showdoc.aspx?i=3631&p=22
Anandtech is BTW a very fine starting point on SSD's as they go into
details that too many reviewers skip over.

To be truthful here, standard index building and searching with Lucene
requires three things from the IO-system: Bulk writes, bulk reads
(mainly for sorting) and random reads. The Intel drive is not stellar
for bulk writes and being superior for random writes does not make a
difference for Lucene/SOLR. if we're only talking search: Pick whatever
SSD you can get your hands on: They are all fine for random reads and
the CPU will probably be the bottleneck.

However, random write speed is a bonus that might show indirectly:
Untarring a million small files, updating a database and similiar is
often part of the workflow with search.


Back in 2007 we were fortunate enough to get a test-machine with 2 types
of SSD, 2 10,000 RPM harddisks and 2 15,000 RPM harddisks. Some quick
notes can be found at http://wiki.statsbiblioteket.dk/summa/Hardware

The world has moved on since then, but that has only widened the gap
between SSDs and harddisks.

Regards,
Toke Eskildsen

Reply via email to