Re: Custom Sharding on solrcloud

Mark Miller Sun, 11 Mar 2012 10:20:28 -0700

Hmm...let me think. At a minimum we intend to make the hashing mechanism 
pluggable...need to think if there is something you else you could try now...


On Mar 8, 2012, at 4:28 AM, Phil Hoy wrote:

> Hi,
> 
> If I remove the DistributedUpdateProcessorFactory I will have to manage a 
> master slave setup myself by updating solely to the master and replicating to 
> any slave. I wonder is it possible to have distributed updates but confined 
> to the sub-set of cores and replicas within  a collection that share the same 
> name?
> 
> Phil
> 
> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: 08 March 2012 01:02
> To: solr-user@lucene.apache.org
> Subject: Re: Custom Sharding on solrcloud
> 
> Hi Phil - 
> 
> The default update chain now includes the distributed update processor by 
> default - and if in solrcloud mode it will be active.
> 
> Probably, what you want to do is define your own update chain (see the wiki). 
> Then you can add that update chain as the default for your json update 
> handler in solrconfig.xml.
> 
> <!-- referencing it in an update handler -->  <requestHandler 
> name="/update/json" class="solr.JsonUpdateRequestHandler" >
>   <lst name="defaults">
>     <str name="update.chain">mychain</str>
>   </lst>
> </requestHandler>
> 
> The default chain is: 
> 
>              new LogUpdateProcessorFactory(),
>              new DistributedUpdateProcessorFactory(),
>              new RunUpdateProcessorFactory()
> 
> So just use Log and Run instead to get your old behavior.
> 
> - Mark
> 
> On Mar 7, 2012, at 1:37 PM, Phil Hoy wrote:
> 
>> Hi,
>> 
>> We have a large index and would like to shard by a particular field value, 
>> in our case surname. This way we can scale out to multiple machines, yet as 
>> most queries filter on surname we can use some application logic to hit just 
>> the one core to get the results we need.
>> 
>> Furthermore as we anticipate the index will grow over time so it make sense 
>> (to us) to host a number of shards on a single machine until they get too 
>> big at which point we can then move them to another machine.
>> 
>> We are using solrcloud and it is set up using a solrcore per shard, that way 
>> we can direct both queries and updates to the appropriate core/shard. To do 
>> this our solr.xml looks a bit like this:
>> 
>> <cores defaultCoreName="default" adminPath="/admin/cores" 
>> zkClientTimeout="10000" hostPort="8983" > <core shard="default" 
>> name="aaa-ava" instanceDir="/data/recordsets/shards/aaa-ava" 
>> collection="recordsets" />
>>              <core shard="aaa-ava" name="aaa-ava" 
>> instanceDir="/data/recordsets/shards/aaa-ava" collection="recordsets" />
>>              <core shard="avb-bel" name="avb-bel" 
>> instanceDir="/data/recordsets/shards/avb-bel" collection="recordsets" />     
>>        .......
>> 
>> Directed updates via:
>> http:/server/solr/aaa-ava/update/json  [{surname:"adams"}]
>> 
>> Directed queries via:
>> http:/server/solr/select?surname:adams&shards=aaa-ava
>> 
>> This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13  
>> before the more recent solrcloud changes but now the update is not directed 
>> to the appropriate core. Is there a better way to achieve our needs?
>> 
>> Phil
>> 
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ______________________________________________________________________
> This email has been scanned by the brightsolid Email Security System. Powered 
> by MessageLabs 
> ______________________________________________________________________

- Mark Miller
lucidimagination.com

Re: Custom Sharding on solrcloud

Reply via email to