On Wed, 2015-01-21 at 07:56 +0100, Nimrod Cohen wrote:
> RAID [0] configuration
> 
> each shard has data on each one of the 8 disks in the RAID, on each
> query to get 1K docs, each shard request to get data from the one RAID
> disk, so we get 8 request to get date from all of the disks and we get
> a queue.

Your RAID-setup (whether it is hardware or software) should use a
parallel queue, so that requests to different physical drives are issued
in parallel under the hood. But RAID is not that well-defined, so maybe
your controller or your software uses a single sequential queue. In that
case, the pattern will be as you describe.

Anyway, RAID 0 does really help for random access, when your access
pattern is homogeneous across shards. Even if you fix the problem with
your current RAID 0 setup, it is unlikely that you would get a
noticeable performance advantage over separate drives. It would make it
easier to add shards though, as you would not have to purchase a new
drive or unbalance your setup by running multiple shards on some drives.

> Regarding the response time, 2-3 seconds is good for our usage also
> getting better is always better, if we will get better we might run
> the analysis on more than 1K.

Limit the amount of fields you request and try experimenting with SolrJ
and the binary protocol: I have found that the time for serializing the
result to XML can be quite high for large responses.

If the number of fields needed is very low and the content of those
fields is not large, you could try using faceting with DocValues to get
the content.


- Toke Eskildsen, State and University Library, Denmark



Reply via email to