Re: Configuring the Distributed

Mark Miller Thu, 01 Dec 2011 16:35:20 -0800

On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote:

> I am not familiar with the index splitter that is in contrib, but I'll
> take a look at it soon.  So the process sounds like it would be to run
> this on all of the current shards indexes based on the hash algorithm.


Not something I've thought deeply about myself yet, but I think the idea would 
be to split as many as you felt you needed to.

If you wanted to keep the full balance always, this would mean splitting every 
shard at once, yes. But this depends on how many boxes (partitions) you are 
willing/able to add at a time.

You might just split one index to start - now it's hash range would be handled 
by two shards instead of one (if you have 3 replicas per shard, this would mean 
adding 3 more boxes). When you needed to expand again, you would split another 
index that was still handling its full starting range. As you grow, once you 
split every original index, you'd start again, splitting one of the now half 
ranges.

> Is there also an index merger in contrib which could be used to merge
> indexes?  I'm assuming this would be the process?

You can merge with IndexWriter.addIndexes (Solr also has an admin command that 
can do this). But I'm not sure where this fits in?

- Mark

> 
> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller <markrmil...@gmail.com> wrote:
>> Not yet - we don't plan on working on this until a lot of other stuff is
>> working solid at this point. But someone else could jump in!
>> 
>> There are a couple ways to go about it that I know of:
>> 
>> A more long term solution may be to start using micro shards - each index
>> starts as multiple indexes. This makes it pretty fast to move mirco shards
>> around as you decide to change partitions. It's also less flexible as you
>> are limited by the number of micro shards you start with.
>> 
>> A more simple and likely first step is to use an index splitter . We
>> already have one in lucene contrib - we would just need to modify it so
>> that it splits based on the hash of the document id. This is super
>> flexible, but splitting will obviously take a little while on a huge index.
>> The current index splitter is a multi pass splitter - good enough to start
>> with, but most files under codec control these days, we may be able to make
>> a single pass splitter soon as well.
>> 
>> Eventually you could imagine using both options - micro shards that could
>> also be split as needed. Though I still wonder if micro shards will be
>> worth the extra complications myself...
>> 
>> Right now though, the idea is that you should pick a good number of
>> partitions to start given your expected data ;) Adding more replicas is
>> trivial though.
>> 
>> - Mark
>> 
>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> 
>>> Another question, is there any support for repartitioning of the index
>>> if a new shard is added?  What is the recommended approach for
>>> handling this?  It seemed that the hashing algorithm (and probably
>>> any) would require the index to be repartitioned should a new shard be
>>> added.
>>> 
>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>> Thanks I will try this first thing in the morning.
>>>> 
>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller <markrmil...@gmail.com>
>>> wrote:
>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson <jej2...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> I am currently looking at the latest solrcloud branch and was
>>>>>> wondering if there was any documentation on configuring the
>>>>>> DistributedUpdateProcessor?  What specifically in solrconfig.xml needs
>>>>>> to be added/modified to make distributed indexing work?
>>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Jaime - take a look at solrconfig-distrib-update.xml in
>>>>> solr/core/src/test-files
>>>>> 
>>>>> You need to enable the update log, add an empty replication handler def,
>>>>> and an update chain with solr.DistributedUpdateProcessFactory in it.
>>>>> 
>>>>> --
>>>>> - Mark
>>>>> 
>>>>> http://www.lucidimagination.com
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> - Mark
>> 
>> http://www.lucidimagination.com
>> 

- Mark Miller
lucidimagination.com

Re: Configuring the Distributed

Reply via email to