Re: Configuring the Distributed

Mark Miller Sat, 03 Dec 2011 17:19:52 -0800

On Sat, Dec 3, 2011 at 1:31 PM, Jamie Johnson <jej2...@gmail.com> wrote:


> Again great stuff.  Once distributed update/delete works (sounds like
> it's not far off)


Yeah, I only realized it was not working with the Version code on Friday as
I started adding tests for it - the work to fix it is not too difficult.


> I'll have to reevaluate our current stack.
>
> You had mentioned storing the shad hash assignments in ZK, is there a
> JIRA around this?
>

I don't think there is specifically one for that yet - though it could just
be part of the index splitting JIRA issue I suppose.


>
> I'll keep my eyes on the JIRA tickets.  Right now the distirbuted
> updated/delete are big ones for me, the rebalancing of the cluster is
> a nice to have, but hopefully I wouldn't need that capability anytime
> in the near future.
>
> One other question....my current setup has replication done on a
> polling setup, I didn't notice that in the updated solrconfig, how
> does this work now?
>

Replication is only used for recovery, because it doesn't work with Near
Realtime. We want SolrCloud to work with NRT, so currently, the leader
versions documents and forward the docs to the replicas (these will let us
do optimistic locking as well). If you send a doc to a replica instead, its
first forwarded to the leader to get versioned. The SolrCloud solrj client
will likely be smart enough to just send to the leader first.

You need the replication handler defined for recovery - when a replica goes
down and then come back up, it starts buffering updates and replicates from
the leader - then it applies the buffered updates and ends up current with
the leader.

- Mark


>
> On Sat, Dec 3, 2011 at 9:00 AM, Mark Miller <markrmil...@gmail.com> wrote:
> > bq. A few questions if a master goes down does a replica get
> > promoted?
> >
> > Right - if the leader goes down there is a leader election and one of the
> > replicas takes over.
> >
> > bq.  If a new shard needs to be added is it just a matter of
> > starting a new solr instance with a higher numShards?
> >
> > Eventually, that's the plan.
> >
> > The idea is, you say something like, I want 3 shards. Now if you start
> up 9
> > instances, the first 3 end up as shard leaders - the next 6 evenly come
> up
> > as replicas for each shard.
> >
> > To change the numShards, we will need some kind of micro shards /
> splitting
> > / rebalancing.
> >
> > bq. Last question, how do you change numShards?
> >
> > I think this is somewhat a work in progress, but I think Sami just made
> it
> > so that numShards is stored on the collection node in zk (along with
> which
> > config set to use). So you would change it there presumably. Or perhaps
> > just start up a new server with an update numShards property and then it
> > would realize that needs to be a new leader - of course then you'd want
> to
> > rebalance probably - unless you fired up enough servers to add replicas
> > too...
> >
> > bq. Is that right?
> >
> > Yup, sounds about right.
> >
> >
> > On Fri, Dec 2, 2011 at 10:59 PM, Jamie Johnson <jej2...@gmail.com>
> wrote:
> >
> >> So I just tried this out, seems like it does the things I asked about.
> >>
> >> Really really cool stuff, it's progressed quite a bit in the time
> >> since I took a snapshot of the branch.
> >>
> >> Last question, how do you change numShards?  Is there a command you
> >> can use to do this now? I understand there will be implications for
> >> the hashing algorithm, but once the hash ranges are stored in ZK (is
> >> there a separate JIRA for this or does this fall under 2358) I assume
> >> that it would be a relatively simple index split (JIRA 2595?) and
> >> updating the hash ranges in solr, essentially splitting the range
> >> between the new and existing shard.  Is that right?
> >>
> >> On Fri, Dec 2, 2011 at 10:08 PM, Jamie Johnson <jej2...@gmail.com>
> wrote:
> >> > I think I see it.....so if I understand this correctly you specify
> >> > numShards as a system property, as new nodes come up they check ZK to
> >> > see if they should be a new shard or a replica based on if numShards
> >> > is met.  A few questions if a master goes down does a replica get
> >> > promoted?  If a new shard needs to be added is it just a matter of
> >> > starting a new solr instance with a higher numShards?  (understanding
> >> > that index rebalancing does not happen automatically now, but
> >> > presumably it could).
> >> >
> >> > On Fri, Dec 2, 2011 at 9:56 PM, Jamie Johnson <jej2...@gmail.com>
> wrote:
> >> >> How does it determine the number of shards to create?  How many
> >> >> replicas to create?
> >> >>
> >> >> On Fri, Dec 2, 2011 at 4:30 PM, Mark Miller <markrmil...@gmail.com>
> >> wrote:
> >> >>> Ah, okay - you are setting the shards in solr.xml - thats still an
> >> option
> >> >>> to force a node to a particular shard - but if you take that out,
> >> shards
> >> >>> will be auto assigned.
> >> >>>
> >> >>> By the way, because of the version code, distrib deletes don't work
> at
> >> the
> >> >>> moment - will get to that next week.
> >> >>>
> >> >>> - Mark
> >> >>>
> >> >>> On Fri, Dec 2, 2011 at 1:16 PM, Jamie Johnson <jej2...@gmail.com>
> >> wrote:
> >> >>>
> >> >>>> So I'm a fool.  I did set the numShards, the issue was so trivial
> it's
> >> >>>> embarrassing.  I did indeed have it setup as a replica, the shard
> >> >>>> names in solr.xml were both shard1.  This worked as I expected now.
> >> >>>>
> >> >>>> On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller <markrmil...@gmail.com
> >
> >> wrote:
> >> >>>> >
> >> >>>> > They are unused params, so removing them wouldn't help anything.
> >> >>>> >
> >> >>>> > You might just want to wait till we are further along before
> playing
> >> >>>> with it.
> >> >>>> >
> >> >>>> > Or if you submit your full self contained test, I can see what's
> >> going
> >> >>>> on (eg its still unclear if you have started setting numShards?).
> >> >>>> >
> >> >>>> > I can do a similar set of actions in my tests and it works fine.
> The
> >> >>>> only reason I could see things working like this is if it thinks
> you
> >> have
> >> >>>> one shard - a leader and a replica.
> >> >>>> >
> >> >>>> > - Mark
> >> >>>> >
> >> >>>> > On Dec 2, 2011, at 12:41 PM, Jamie Johnson wrote:
> >> >>>> >
> >> >>>> >> Glad to hear I don't need to set shards/self, but removing them
> >> didn't
> >> >>>> >> seem to change what I'm seeing.  Doing this still results in 2
> >> >>>> >> documents 1 on 8983 and 1 on 7574.
> >> >>>> >>
> >> >>>> >> String key = "1";
> >> >>>> >>
> >> >>>> >>               SolrInputDocument solrDoc = new
> SolrInputDocument();
> >> >>>> >>               solrDoc.setField("key", key);
> >> >>>> >>
> >> >>>> >>               solrDoc.addField("content_mvtxt", "initial
> value");
> >> >>>> >>
> >> >>>> >>               SolrServer server = servers.get("
> >> >>>> http://localhost:8983/solr/collection1";);
> >> >>>> >>
> >> >>>> >>               UpdateRequest ureq = new UpdateRequest();
> >> >>>> >>               ureq.setParam("update.chain",
> >> "distrib-update-chain");
> >> >>>> >>               ureq.add(solrDoc);
> >> >>>> >>               ureq.setAction(ACTION.COMMIT, true, true);
> >> >>>> >>               server.request(ureq);
> >> >>>> >>               server.commit();
> >> >>>> >>
> >> >>>> >>               solrDoc = new SolrInputDocument();
> >> >>>> >>               solrDoc.addField("key", key);
> >> >>>> >>               solrDoc.addField("content_mvtxt", "updated
> value");
> >> >>>> >>
> >> >>>> >>               server = servers.get("
> >> >>>> http://localhost:7574/solr/collection1";);
> >> >>>> >>
> >> >>>> >>               ureq = new UpdateRequest();
> >> >>>> >>               ureq.setParam("update.chain",
> >> "distrib-update-chain");
> >> >>>> >>               ureq.add(solrDoc);
> >> >>>> >>               ureq.setAction(ACTION.COMMIT, true, true);
> >> >>>> >>               server.request(ureq);
> >> >>>> >>               server.commit();
> >> >>>> >>
> >> >>>> >>               server = servers.get("
> >> >>>> http://localhost:8983/solr/collection1";);
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>               server.commit();
> >> >>>> >>               System.out.println("done");
> >> >>>> >>
> >> >>>> >> On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller <
> >> markrmil...@gmail.com>
> >> >>>> wrote:
> >> >>>> >>> So I dunno. You are running a zk server and running in zk mode
> >> right?
> >> >>>> >>>
> >> >>>> >>> You don't need to / shouldn't set a shards or self param. The
> >> shards
> >> >>>> are
> >> >>>> >>> figured out from Zookeeper.
> >> >>>> >>>
> >> >>>> >>> You always want to use the distrib-update-chain. Eventually it
> >> will
> >> >>>> >>> probably be part of the default chain and auto turn in zk mode.
> >> >>>> >>>
> >> >>>> >>> If you are running in zk mode attached to a zk server, this
> should
> >> >>>> work no
> >> >>>> >>> problem. You can add docs to any server and they will be
> >> forwarded to
> >> >>>> the
> >> >>>> >>> correct shard leader and then versioned and forwarded to
> replicas.
> >> >>>> >>>
> >> >>>> >>> You can also use the CloudSolrServer solrj client - that way
> you
> >> don't
> >> >>>> even
> >> >>>> >>> have to choose a server to send docs too - in which case if it
> >> went
> >> >>>> down
> >> >>>> >>> you would have to choose another manually - CloudSolrServer
> >> >>>> automatically
> >> >>>> >>> finds one that is up through ZooKeeper. Eventually it will
> also be
> >> >>>> smart
> >> >>>> >>> and do the hashing itself so that it can send directly to the
> >> shard
> >> >>>> leader
> >> >>>> >>> that the doc would be forwarded to anyway.
> >> >>>> >>>
> >> >>>> >>> - Mark
> >> >>>> >>>
> >> >>>> >>> On Fri, Dec 2, 2011 at 12:09 AM, Jamie Johnson <
> jej2...@gmail.com
> >> >
> >> >>>> wrote:
> >> >>>> >>>
> >> >>>> >>>> Really just trying to do a simple add and update test, the
> chain
> >> >>>> >>>> missing is just proof of my not understanding exactly how
> this is
> >> >>>> >>>> supposed to work.  I modified the code to this
> >> >>>> >>>>
> >> >>>> >>>>                String key = "1";
> >> >>>> >>>>
> >> >>>> >>>>                SolrInputDocument solrDoc = new
> >> SolrInputDocument();
> >> >>>> >>>>                solrDoc.setField("key", key);
> >> >>>> >>>>
> >> >>>> >>>>                 solrDoc.addField("content_mvtxt", "initial
> >> value");
> >> >>>> >>>>
> >> >>>> >>>>                SolrServer server = servers
> >> >>>> >>>>                                .get("
> >> >>>> >>>> http://localhost:8983/solr/collection1";);
> >> >>>> >>>>
> >> >>>> >>>>                 UpdateRequest ureq = new UpdateRequest();
> >> >>>> >>>>                ureq.setParam("update.chain",
> >> "distrib-update-chain");
> >> >>>> >>>>                ureq.add(solrDoc);
> >> >>>> >>>>                ureq.setParam("shards",
> >> >>>> >>>>
> >> >>>> >>>>
> >>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
> >> >>>> >>>>                ureq.setParam("self", "foo");
> >> >>>> >>>>                ureq.setAction(ACTION.COMMIT, true, true);
> >> >>>> >>>>                server.request(ureq);
> >> >>>> >>>>                 server.commit();
> >> >>>> >>>>
> >> >>>> >>>>                solrDoc = new SolrInputDocument();
> >> >>>> >>>>                solrDoc.addField("key", key);
> >> >>>> >>>>                 solrDoc.addField("content_mvtxt", "updated
> >> value");
> >> >>>> >>>>
> >> >>>> >>>>                server = servers.get("
> >> >>>> >>>> http://localhost:7574/solr/collection1";);
> >> >>>> >>>>
> >> >>>> >>>>                 ureq = new UpdateRequest();
> >> >>>> >>>>                ureq.setParam("update.chain",
> >> "distrib-update-chain");
> >> >>>> >>>>                 //
> >> >>>> ureq.deleteById("8060a9eb-9546-43ee-95bb-d18ea26a6285");
> >> >>>> >>>>                 ureq.add(solrDoc);
> >> >>>> >>>>                ureq.setParam("shards",
> >> >>>> >>>>
> >> >>>> >>>>
> >>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
> >> >>>> >>>>                ureq.setParam("self", "foo");
> >> >>>> >>>>                ureq.setAction(ACTION.COMMIT, true, true);
> >> >>>> >>>>                server.request(ureq);
> >> >>>> >>>>                 // server.add(solrDoc);
> >> >>>> >>>>                server.commit();
> >> >>>> >>>>                server = servers.get("
> >> >>>> >>>> http://localhost:8983/solr/collection1";);
> >> >>>> >>>>
> >> >>>> >>>>
> >> >>>> >>>>                server.commit();
> >> >>>> >>>>                System.out.println("done");
> >> >>>> >>>>
> >> >>>> >>>> but I'm still seeing the doc appear on both shards.    After
> the
> >> first
> >> >>>> >>>> commit I see the doc on 8983 with "initial value".  after the
> >> second
> >> >>>> >>>> commit I see the updated value on 7574 and the old on 8983.
> >>  After the
> >> >>>> >>>> final commit the doc on 8983 gets updated.
> >> >>>> >>>>
> >> >>>> >>>> Is there something wrong with my test?
> >> >>>> >>>>
> >> >>>> >>>> On Thu, Dec 1, 2011 at 11:17 PM, Mark Miller <
> >> markrmil...@gmail.com>
> >> >>>> >>>> wrote:
> >> >>>> >>>>> Getting late - didn't really pay attention to your code I
> guess
> >> - why
> >> >>>> >>>> are you adding the first doc without specifying the distrib
> >> update
> >> >>>> chain?
> >> >>>> >>>> This is not really supported. It's going to just go to the
> >> server you
> >> >>>> >>>> specified - even with everything setup right, the update might
> >> then
> >> >>>> go to
> >> >>>> >>>> that same server or the other one depending on how it hashes.
> You
> >> >>>> really
> >> >>>> >>>> want to just always use the distrib update chain.  I guess I
> >> don't yet
> >> >>>> >>>> understand what you are trying to test.
> >> >>>> >>>>>
> >> >>>> >>>>> Sent from my iPad
> >> >>>> >>>>>
> >> >>>> >>>>> On Dec 1, 2011, at 10:57 PM, Mark Miller <
> markrmil...@gmail.com
> >> >
> >> >>>> wrote:
> >> >>>> >>>>>
> >> >>>> >>>>>> Not sure offhand - but things will be funky if you don't
> >> specify the
> >> >>>> >>>> correct numShards.
> >> >>>> >>>>>>
> >> >>>> >>>>>> The instance to shard assignment should be using numShards
> to
> >> >>>> assign.
> >> >>>> >>>> But then the hash to shard mapping actually goes on the
> number of
> >> >>>> shards it
> >> >>>> >>>> finds registered in ZK (it doesn't have to, but really these
> >> should be
> >> >>>> >>>> equal).
> >> >>>> >>>>>>
> >> >>>> >>>>>> So basically you are saying, I want 3 partitions, but you
> are
> >> only
> >> >>>> >>>> starting up 2 nodes, and the code is just not happy about that
> >> I'd
> >> >>>> guess.
> >> >>>> >>>> For the system to work properly, you have to fire up at least
> as
> >> many
> >> >>>> >>>> servers as numShards.
> >> >>>> >>>>>>
> >> >>>> >>>>>> What are you trying to do? 2 partitions with no replicas, or
> >> one
> >> >>>> >>>> partition with one replica?
> >> >>>> >>>>>>
> >> >>>> >>>>>> In either case, I think you will have better luck if you
> fire
> >> up at
> >> >>>> >>>> least as many servers as the numShards setting. Or lower the
> >> numShards
> >> >>>> >>>> setting.
> >> >>>> >>>>>>
> >> >>>> >>>>>> This is all a work in progress by the way - what you are
> >> trying to
> >> >>>> test
> >> >>>> >>>> should work if things are setup right though.
> >> >>>> >>>>>>
> >> >>>> >>>>>> - Mark
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>> On Dec 1, 2011, at 10:40 PM, Jamie Johnson wrote:
> >> >>>> >>>>>>
> >> >>>> >>>>>>> Thanks for the quick response.  With that change (have not
> >> done
> >> >>>> >>>>>>> numShards yet) shard1 got updated.  But now when executing
> the
> >> >>>> >>>>>>> following queries I get information back from both, which
> >> doesn't
> >> >>>> seem
> >> >>>> >>>>>>> right
> >> >>>> >>>>>>>
> >> >>>> >>>>>>> http://localhost:7574/solr/select/?q=*:*
> >> >>>> >>>>>>> <doc><str name="key">1</str><str
> name="content_mvtxt">updated
> >> >>>> >>>> value</str></doc>
> >> >>>> >>>>>>>
> >> >>>> >>>>>>> http://localhost:8983/solr/select?q=*:*
> >> >>>> >>>>>>> <doc><str name="key">1</str><str
> name="content_mvtxt">updated
> >> >>>> >>>> value</str></doc>
> >> >>>> >>>>>>>
> >> >>>> >>>>>>>
> >> >>>> >>>>>>>
> >> >>>> >>>>>>> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller <
> >> >>>> markrmil...@gmail.com>
> >> >>>> >>>> wrote:
> >> >>>> >>>>>>>> Hmm...sorry bout that - so my first guess is that right
> now
> >> we are
> >> >>>> >>>> not distributing a commit (easy to add, just have not done
> it).
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> Right now I explicitly commit on each server for tests.
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> Can you try explicitly committing on server1 after
> updating
> >> the
> >> >>>> doc
> >> >>>> >>>> on server 2?
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> I can start distributing commits tomorrow - been meaning
> to
> >> do it
> >> >>>> for
> >> >>>> >>>> my own convenience anyhow.
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> Also, you want to pass the sys property numShards=1 on
> >> startup. I
> >> >>>> >>>> think it defaults to 3. That will give you one leader and one
> >> replica.
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> - Mark
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> On Dec 1, 2011, at 9:56 PM, Jamie Johnson wrote:
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>> So I couldn't resist, I attempted to do this tonight, I
> >> used the
> >> >>>> >>>>>>>>> solrconfig you mentioned (as is, no modifications), I
> setup
> >> a 2
> >> >>>> shard
> >> >>>> >>>>>>>>> cluster in collection1, I sent 1 doc to 1 of the shards,
> >> updated
> >> >>>> it
> >> >>>> >>>>>>>>> and sent the update to the other.  I don't see the
> >> modifications
> >> >>>> >>>>>>>>> though I only see the original document.  The following
> is
> >> the
> >> >>>> test
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>> public void update() throws Exception {
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              String key = "1";
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              SolrInputDocument solrDoc = new
> >> SolrInputDocument();
> >> >>>> >>>>>>>>>              solrDoc.setField("key", key);
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              solrDoc.addField("content", "initial
> value");
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              SolrServer server = servers
> >> >>>> >>>>>>>>>                              .get("
> >> >>>> >>>> http://localhost:8983/solr/collection1";);
> >> >>>> >>>>>>>>>              server.add(solrDoc);
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              server.commit();
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              solrDoc = new SolrInputDocument();
> >> >>>> >>>>>>>>>              solrDoc.addField("key", key);
> >> >>>> >>>>>>>>>              solrDoc.addField("content", "updated
> value");
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              server = servers.get("
> >> >>>> >>>> http://localhost:7574/solr/collection1";);
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>>              UpdateRequest ureq = new UpdateRequest();
> >> >>>> >>>>>>>>>              ureq.setParam("update.chain",
> >> >>>> "distrib-update-chain");
> >> >>>> >>>>>>>>>              ureq.add(solrDoc);
> >> >>>> >>>>>>>>>              ureq.setParam("shards",
> >> >>>> >>>>>>>>>
> >> >>>> >>>>
> >>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
> >> >>>> >>>>>>>>>              ureq.setParam("self", "foo");
> >> >>>> >>>>>>>>>              ureq.setAction(ACTION.COMMIT, true, true);
> >> >>>> >>>>>>>>>              server.request(ureq);
> >> >>>> >>>>>>>>>              System.out.println("done");
> >> >>>> >>>>>>>>>      }
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>> key is my unique field in schema.xml
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>> What am I doing wrong?
> >> >>>> >>>>>>>>>
> >> >>>> >>>>>>>>> On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson <
> >> jej2...@gmail.com
> >> >>>> >
> >> >>>> >>>> wrote:
> >> >>>> >>>>>>>>>> Yes, the ZK method seems much more flexible.  Adding a
> new
> >> shard
> >> >>>> >>>> would
> >> >>>> >>>>>>>>>> be simply updating the range assignments in ZK.  Where
> is
> >> this
> >> >>>> >>>>>>>>>> currently on the list of things to accomplish?  I don't
> >> have
> >> >>>> time to
> >> >>>> >>>>>>>>>> work on this now, but if you (or anyone) could provide
> >> >>>> direction I'd
> >> >>>> >>>>>>>>>> be willing to work on this when I had spare time.  I
> guess
> >> a
> >> >>>> JIRA
> >> >>>> >>>>>>>>>> detailing where/how to do this could help.  Not sure if
> the
> >> >>>> design
> >> >>>> >>>> has
> >> >>>> >>>>>>>>>> been thought out that far though.
> >> >>>> >>>>>>>>>>
> >> >>>> >>>>>>>>>> On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller <
> >> >>>> markrmil...@gmail.com>
> >> >>>> >>>> wrote:
> >> >>>> >>>>>>>>>>> Right now lets say you have one shard - everything
> there
> >> >>>> hashes to
> >> >>>> >>>> range X.
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> Now you want to split that shard with an Index
> Splitter.
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> You divide range X in two - giving you two ranges -
> then
> >> you
> >> >>>> start
> >> >>>> >>>> splitting. This is where the current Splitter needs a little
> >> >>>> modification.
> >> >>>> >>>> You decide which doc should go into which new index by
> rehashing
> >> each
> >> >>>> doc
> >> >>>> >>>> id in the index you are splitting - if its hash is greater
> than
> >> X/2,
> >> >>>> it
> >> >>>> >>>> goes into index1 - if its less, index2. I think there are a
> >> couple
> >> >>>> current
> >> >>>> >>>> Splitter impls, but one of them does something like, give me
> an
> >> id -
> >> >>>> now if
> >> >>>> >>>> the id's in the index are above that id, goto index1, if
> below,
> >> >>>> index2. We
> >> >>>> >>>> need to instead do a quick hash rather than simple id compare.
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> Why do you need to do this on every shard?
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> The other part we need that we dont have is to store
> hash
> >> range
> >> >>>> >>>> assignments in zookeeper - we don't do that yet because it's
> not
> >> >>>> needed
> >> >>>> >>>> yet. Instead we currently just simply calculate that on the
> fly
> >> (too
> >> >>>> often
> >> >>>> >>>> at the moment - on every request :) I intend to fix that of
> >> course).
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> At the start, zk would say, for range X, goto this
> shard.
> >> After
> >> >>>> >>>> the split, it would say, for range less than X/2 goto the old
> >> node,
> >> >>>> for
> >> >>>> >>>> range greater than X/2 goto the new node.
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> - Mark
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote:
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>> hmmm.....This doesn't sound like the hashing algorithm
> >> that's
> >> >>>> on
> >> >>>> >>>> the
> >> >>>> >>>>>>>>>>>> branch, right?  The algorithm you're mentioning sounds
> >> like
> >> >>>> there
> >> >>>> >>>> is
> >> >>>> >>>>>>>>>>>> some logic which is able to tell that a particular
> range
> >> >>>> should be
> >> >>>> >>>>>>>>>>>> distributed between 2 shards instead of 1.  So seems
> >> like a
> >> >>>> trade
> >> >>>> >>>> off
> >> >>>> >>>>>>>>>>>> between repartitioning the entire index (on every
> shard)
> >> and
> >> >>>> >>>> having a
> >> >>>> >>>>>>>>>>>> custom hashing algorithm which is able to handle the
> >> situation
> >> >>>> >>>> where 2
> >> >>>> >>>>>>>>>>>> or more shards map to a particular range.
> >> >>>> >>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller <
> >> >>>> >>>> markrmil...@gmail.com> wrote:
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote:
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>> I am not familiar with the index splitter that is in
> >> >>>> contrib,
> >> >>>> >>>> but I'll
> >> >>>> >>>>>>>>>>>>>> take a look at it soon.  So the process sounds like
> it
> >> >>>> would be
> >> >>>> >>>> to run
> >> >>>> >>>>>>>>>>>>>> this on all of the current shards indexes based on
> the
> >> hash
> >> >>>> >>>> algorithm.
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> Not something I've thought deeply about myself yet,
> but
> >> I
> >> >>>> think
> >> >>>> >>>> the idea would be to split as many as you felt you needed to.
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> If you wanted to keep the full balance always, this
> >> would
> >> >>>> mean
> >> >>>> >>>> splitting every shard at once, yes. But this depends on how
> many
> >> boxes
> >> >>>> >>>> (partitions) you are willing/able to add at a time.
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> You might just split one index to start - now it's
> hash
> >> range
> >> >>>> >>>> would be handled by two shards instead of one (if you have 3
> >> replicas
> >> >>>> per
> >> >>>> >>>> shard, this would mean adding 3 more boxes). When you needed
> to
> >> expand
> >> >>>> >>>> again, you would split another index that was still handling
> its
> >> full
> >> >>>> >>>> starting range. As you grow, once you split every original
> index,
> >> >>>> you'd
> >> >>>> >>>> start again, splitting one of the now half ranges.
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>> Is there also an index merger in contrib which
> could be
> >> >>>> used to
> >> >>>> >>>> merge
> >> >>>> >>>>>>>>>>>>>> indexes?  I'm assuming this would be the process?
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> You can merge with IndexWriter.addIndexes (Solr also
> >> has an
> >> >>>> >>>> admin command that can do this). But I'm not sure where this
> >> fits in?
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> - Mark
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller <
> >> >>>> >>>> markrmil...@gmail.com> wrote:
> >> >>>> >>>>>>>>>>>>>>> Not yet - we don't plan on working on this until a
> >> lot of
> >> >>>> >>>> other stuff is
> >> >>>> >>>>>>>>>>>>>>> working solid at this point. But someone else could
> >> jump
> >> >>>> in!
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> There are a couple ways to go about it that I know
> of:
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> A more long term solution may be to start using
> micro
> >> >>>> shards -
> >> >>>> >>>> each index
> >> >>>> >>>>>>>>>>>>>>> starts as multiple indexes. This makes it pretty
> fast
> >> to
> >> >>>> move
> >> >>>> >>>> mirco shards
> >> >>>> >>>>>>>>>>>>>>> around as you decide to change partitions. It's
> also
> >> less
> >> >>>> >>>> flexible as you
> >> >>>> >>>>>>>>>>>>>>> are limited by the number of micro shards you start
> >> with.
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> A more simple and likely first step is to use an
> index
> >> >>>> >>>> splitter . We
> >> >>>> >>>>>>>>>>>>>>> already have one in lucene contrib - we would just
> >> need to
> >> >>>> >>>> modify it so
> >> >>>> >>>>>>>>>>>>>>> that it splits based on the hash of the document
> id.
> >> This
> >> >>>> is
> >> >>>> >>>> super
> >> >>>> >>>>>>>>>>>>>>> flexible, but splitting will obviously take a
> little
> >> while
> >> >>>> on
> >> >>>> >>>> a huge index.
> >> >>>> >>>>>>>>>>>>>>> The current index splitter is a multi pass
> splitter -
> >> good
> >> >>>> >>>> enough to start
> >> >>>> >>>>>>>>>>>>>>> with, but most files under codec control these
> days,
> >> we
> >> >>>> may be
> >> >>>> >>>> able to make
> >> >>>> >>>>>>>>>>>>>>> a single pass splitter soon as well.
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> Eventually you could imagine using both options -
> >> micro
> >> >>>> shards
> >> >>>> >>>> that could
> >> >>>> >>>>>>>>>>>>>>> also be split as needed. Though I still wonder if
> >> micro
> >> >>>> shards
> >> >>>> >>>> will be
> >> >>>> >>>>>>>>>>>>>>> worth the extra complications myself...
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> Right now though, the idea is that you should pick
> a
> >> good
> >> >>>> >>>> number of
> >> >>>> >>>>>>>>>>>>>>> partitions to start given your expected data ;)
> >> Adding more
> >> >>>> >>>> replicas is
> >> >>>> >>>>>>>>>>>>>>> trivial though.
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> - Mark
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson <
> >> >>>> >>>> jej2...@gmail.com> wrote:
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>> Another question, is there any support for
> >> repartitioning
> >> >>>> of
> >> >>>> >>>> the index
> >> >>>> >>>>>>>>>>>>>>>> if a new shard is added?  What is the recommended
> >> >>>> approach for
> >> >>>> >>>>>>>>>>>>>>>> handling this?  It seemed that the hashing
> algorithm
> >> (and
> >> >>>> >>>> probably
> >> >>>> >>>>>>>>>>>>>>>> any) would require the index to be repartitioned
> >> should a
> >> >>>> new
> >> >>>> >>>> shard be
> >> >>>> >>>>>>>>>>>>>>>> added.
> >> >>>> >>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson <
> >> >>>> >>>> jej2...@gmail.com> wrote:
> >> >>>> >>>>>>>>>>>>>>>>> Thanks I will try this first thing in the
> morning.
> >> >>>> >>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller <
> >> >>>> >>>> markrmil...@gmail.com>
> >> >>>> >>>>>>>>>>>>>>>> wrote:
> >> >>>> >>>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson <
> >> >>>> >>>> jej2...@gmail.com>
> >> >>>> >>>>>>>>>>>>>>>> wrote:
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>>> I am currently looking at the latest solrcloud
> >> branch
> >> >>>> and
> >> >>>> >>>> was
> >> >>>> >>>>>>>>>>>>>>>>>>> wondering if there was any documentation on
> >> >>>> configuring the
> >> >>>> >>>>>>>>>>>>>>>>>>> DistributedUpdateProcessor?  What specifically
> in
> >> >>>> >>>> solrconfig.xml needs
> >> >>>> >>>>>>>>>>>>>>>>>>> to be added/modified to make distributed
> indexing
> >> work?
> >> >>>> >>>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>> Hi Jaime - take a look at
> >> solrconfig-distrib-update.xml
> >> >>>> in
> >> >>>> >>>>>>>>>>>>>>>>>> solr/core/src/test-files
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>> You need to enable the update log, add an empty
> >> >>>> replication
> >> >>>> >>>> handler def,
> >> >>>> >>>>>>>>>>>>>>>>>> and an update chain with
> >> >>>> >>>> solr.DistributedUpdateProcessFactory in it.
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>> --
> >> >>>> >>>>>>>>>>>>>>>>>> - Mark
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>> http://www.lucidimagination.com
> >> >>>> >>>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> --
> >> >>>> >>>>>>>>>>>>>>> - Mark
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>>> http://www.lucidimagination.com
> >> >>>> >>>>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>> - Mark Miller
> >> >>>> >>>>>>>>>>>>> lucidimagination.com
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>> - Mark Miller
> >> >>>> >>>>>>>>>>> lucidimagination.com
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>>
> >> >>>> >>>>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>> - Mark Miller
> >> >>>> >>>>>>>> lucidimagination.com
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>> - Mark Miller
> >> >>>> >>>>>> lucidimagination.com
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>>
> >> >>>> >>>>>
> >> >>>> >>>>
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>> --
> >> >>>> >>> - Mark
> >> >>>> >>>
> >> >>>> >>> http://www.lucidimagination.com
> >> >>>> >>>
> >> >>>> >
> >> >>>> > - Mark Miller
> >> >>>> > lucidimagination.com
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> - Mark
> >> >>>
> >> >>> http://www.lucidimagination.com
> >>
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
>



-- 
- Mark

http://www.lucidimagination.com

Re: Configuring the Distributed

Reply via email to