On 9/11/2013 1:07 PM, Deepak Konidena wrote:
Are you suggesting a multi-core setup, where all the cores share the same
schema, and the cores lie on different disks?
Basically, I'd like to know if I can distribute shards/segments on a single
machine (with multiple disks) without the use of zookeeper.
Sure, you can do it all manually. At that point you would not be using
SolrCloud at all, because the way to enable SolrCloud is to tell Solr
where zookeeper lives.
Without SolrCloud, there is no cluster automation at all. There is no
"collection" paradigm, you just have cores. You have to send updates to
the correct core; they not be redirected for you. Similarly, queries
will not be load balanced automatically. For Java clients, the
CloudSolrServer object can work seamlessly when servers go down. If
you're not using SolrCloud, you can't use CloudSolrServer.
You would be in charge of creating the shards parameter yourself. The
way that I do this on my index is that I have a "broker" core that has
no index of its own, but its solrconfig.xml has the shards and shards.qt
parameters in all the request handler definitions. You can also include
the parameter with the query.
You would also have to handle redundancy yourself, either with
replication or with independently updated indexes. I use the latter
method, because it offers a lot more flexibility than replication.
As mentioned in another reply, setting up RAID with a lot of disks may
be better than trying to split your index up on different filesystems
that each reside on different disks. I would recommend RAID10 for Solr,
and it works best if it's hardware RAID and the controller has
battery-backed (or NVRAM) cache.
Thanks,
Shawn