On Wed, 2015-01-21 at 07:56 +0100, Nimrod Cohen wrote: > RAID [0] configuration > > each shard has data on each one of the 8 disks in the RAID, on each > query to get 1K docs, each shard request to get data from the one RAID > disk, so we get 8 request to get date from all of the disks and we get > a queue.
Your RAID-setup (whether it is hardware or software) should use a parallel queue, so that requests to different physical drives are issued in parallel under the hood. But RAID is not that well-defined, so maybe your controller or your software uses a single sequential queue. In that case, the pattern will be as you describe. Anyway, RAID 0 does really help for random access, when your access pattern is homogeneous across shards. Even if you fix the problem with your current RAID 0 setup, it is unlikely that you would get a noticeable performance advantage over separate drives. It would make it easier to add shards though, as you would not have to purchase a new drive or unbalance your setup by running multiple shards on some drives. > Regarding the response time, 2-3 seconds is good for our usage also > getting better is always better, if we will get better we might run > the analysis on more than 1K. Limit the amount of fields you request and try experimenting with SolrJ and the binary protocol: I have found that the time for serializing the result to XML can be quite high for large responses. If the number of fields needed is very low and the content of those fields is not large, you could try using faceting with DocValues to get the content. - Toke Eskildsen, State and University Library, Denmark