On Tue, 2010-02-16 at 10:35 +0100, Tim Terlegård wrote: > I actually tried SSD yesterday. Queries which need to go to disk are > much faster now. I did expect that warmup for sort fields would be > much quicker as well, but that seems to be cpu bound.
That and bulk I/O. The sorter imports the Terms into RAM by iterating, which means that the IO-access for this is sequential. Most modern SSDs are faster than conventional harddisks for this, but not by much. > It still takes a minute to cache the six sort fields of the 40 million > document index. I am not aware of any solutions to this, besides beefing hardware bulk reads and processor speed (the sorter is not threaded as far as I remember). It it technically possible to move this step to the indexer, but the only win would be for setups with few builders and many searchers. > Are there any differences among SSD disks. Why is Intel X25-M your favourite? A soft reason is that I have faith in support from Intel: There has been problems with earlier versions of the drive (nuking content in some edge-cases and performance degradation (which hits all SSDs)) and Intel has responded well by acknowledging the problems and resolving them. That's very subjective though and I'm sure that some would turn that around and say that Intel delivered crap in the first place. On the harder side, the Intel drive is surprisingly cheap and provides random IO performance ahead of most competitors. Especially for random writes, which is normally the weak point for SSDs. Some graphs can be found at Anandtech: http://anandtech.com/storage/showdoc.aspx?i=3631&p=22 Anandtech is BTW a very fine starting point on SSD's as they go into details that too many reviewers skip over. To be truthful here, standard index building and searching with Lucene requires three things from the IO-system: Bulk writes, bulk reads (mainly for sorting) and random reads. The Intel drive is not stellar for bulk writes and being superior for random writes does not make a difference for Lucene/SOLR. if we're only talking search: Pick whatever SSD you can get your hands on: They are all fine for random reads and the CPU will probably be the bottleneck. However, random write speed is a bonus that might show indirectly: Untarring a million small files, updating a database and similiar is often part of the workflow with search. Back in 2007 we were fortunate enough to get a test-machine with 2 types of SSD, 2 10,000 RPM harddisks and 2 15,000 RPM harddisks. Some quick notes can be found at http://wiki.statsbiblioteket.dk/summa/Hardware The world has moved on since then, but that has only widened the gap between SSDs and harddisks. Regards, Toke Eskildsen