So I'm a fool. I did set the numShards, the issue was so trivial it's embarrassing. I did indeed have it setup as a replica, the shard names in solr.xml were both shard1. This worked as I expected now.
On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller <markrmil...@gmail.com> wrote: > > They are unused params, so removing them wouldn't help anything. > > You might just want to wait till we are further along before playing with it. > > Or if you submit your full self contained test, I can see what's going on (eg > its still unclear if you have started setting numShards?). > > I can do a similar set of actions in my tests and it works fine. The only > reason I could see things working like this is if it thinks you have one > shard - a leader and a replica. > > - Mark > > On Dec 2, 2011, at 12:41 PM, Jamie Johnson wrote: > >> Glad to hear I don't need to set shards/self, but removing them didn't >> seem to change what I'm seeing. Doing this still results in 2 >> documents 1 on 8983 and 1 on 7574. >> >> String key = "1"; >> >> SolrInputDocument solrDoc = new SolrInputDocument(); >> solrDoc.setField("key", key); >> >> solrDoc.addField("content_mvtxt", "initial value"); >> >> SolrServer server = >> servers.get("http://localhost:8983/solr/collection1"); >> >> UpdateRequest ureq = new UpdateRequest(); >> ureq.setParam("update.chain", "distrib-update-chain"); >> ureq.add(solrDoc); >> ureq.setAction(ACTION.COMMIT, true, true); >> server.request(ureq); >> server.commit(); >> >> solrDoc = new SolrInputDocument(); >> solrDoc.addField("key", key); >> solrDoc.addField("content_mvtxt", "updated value"); >> >> server = servers.get("http://localhost:7574/solr/collection1"); >> >> ureq = new UpdateRequest(); >> ureq.setParam("update.chain", "distrib-update-chain"); >> ureq.add(solrDoc); >> ureq.setAction(ACTION.COMMIT, true, true); >> server.request(ureq); >> server.commit(); >> >> server = servers.get("http://localhost:8983/solr/collection1"); >> >> >> server.commit(); >> System.out.println("done"); >> >> On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller <markrmil...@gmail.com> wrote: >>> So I dunno. You are running a zk server and running in zk mode right? >>> >>> You don't need to / shouldn't set a shards or self param. The shards are >>> figured out from Zookeeper. >>> >>> You always want to use the distrib-update-chain. Eventually it will >>> probably be part of the default chain and auto turn in zk mode. >>> >>> If you are running in zk mode attached to a zk server, this should work no >>> problem. You can add docs to any server and they will be forwarded to the >>> correct shard leader and then versioned and forwarded to replicas. >>> >>> You can also use the CloudSolrServer solrj client - that way you don't even >>> have to choose a server to send docs too - in which case if it went down >>> you would have to choose another manually - CloudSolrServer automatically >>> finds one that is up through ZooKeeper. Eventually it will also be smart >>> and do the hashing itself so that it can send directly to the shard leader >>> that the doc would be forwarded to anyway. >>> >>> - Mark >>> >>> On Fri, Dec 2, 2011 at 12:09 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>> >>>> Really just trying to do a simple add and update test, the chain >>>> missing is just proof of my not understanding exactly how this is >>>> supposed to work. I modified the code to this >>>> >>>> String key = "1"; >>>> >>>> SolrInputDocument solrDoc = new SolrInputDocument(); >>>> solrDoc.setField("key", key); >>>> >>>> solrDoc.addField("content_mvtxt", "initial value"); >>>> >>>> SolrServer server = servers >>>> .get(" >>>> http://localhost:8983/solr/collection1"); >>>> >>>> UpdateRequest ureq = new UpdateRequest(); >>>> ureq.setParam("update.chain", "distrib-update-chain"); >>>> ureq.add(solrDoc); >>>> ureq.setParam("shards", >>>> >>>> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >>>> ureq.setParam("self", "foo"); >>>> ureq.setAction(ACTION.COMMIT, true, true); >>>> server.request(ureq); >>>> server.commit(); >>>> >>>> solrDoc = new SolrInputDocument(); >>>> solrDoc.addField("key", key); >>>> solrDoc.addField("content_mvtxt", "updated value"); >>>> >>>> server = servers.get(" >>>> http://localhost:7574/solr/collection1"); >>>> >>>> ureq = new UpdateRequest(); >>>> ureq.setParam("update.chain", "distrib-update-chain"); >>>> // ureq.deleteById("8060a9eb-9546-43ee-95bb-d18ea26a6285"); >>>> ureq.add(solrDoc); >>>> ureq.setParam("shards", >>>> >>>> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >>>> ureq.setParam("self", "foo"); >>>> ureq.setAction(ACTION.COMMIT, true, true); >>>> server.request(ureq); >>>> // server.add(solrDoc); >>>> server.commit(); >>>> server = servers.get(" >>>> http://localhost:8983/solr/collection1"); >>>> >>>> >>>> server.commit(); >>>> System.out.println("done"); >>>> >>>> but I'm still seeing the doc appear on both shards. After the first >>>> commit I see the doc on 8983 with "initial value". after the second >>>> commit I see the updated value on 7574 and the old on 8983. After the >>>> final commit the doc on 8983 gets updated. >>>> >>>> Is there something wrong with my test? >>>> >>>> On Thu, Dec 1, 2011 at 11:17 PM, Mark Miller <markrmil...@gmail.com> >>>> wrote: >>>>> Getting late - didn't really pay attention to your code I guess - why >>>> are you adding the first doc without specifying the distrib update chain? >>>> This is not really supported. It's going to just go to the server you >>>> specified - even with everything setup right, the update might then go to >>>> that same server or the other one depending on how it hashes. You really >>>> want to just always use the distrib update chain. I guess I don't yet >>>> understand what you are trying to test. >>>>> >>>>> Sent from my iPad >>>>> >>>>> On Dec 1, 2011, at 10:57 PM, Mark Miller <markrmil...@gmail.com> wrote: >>>>> >>>>>> Not sure offhand - but things will be funky if you don't specify the >>>> correct numShards. >>>>>> >>>>>> The instance to shard assignment should be using numShards to assign. >>>> But then the hash to shard mapping actually goes on the number of shards it >>>> finds registered in ZK (it doesn't have to, but really these should be >>>> equal). >>>>>> >>>>>> So basically you are saying, I want 3 partitions, but you are only >>>> starting up 2 nodes, and the code is just not happy about that I'd guess. >>>> For the system to work properly, you have to fire up at least as many >>>> servers as numShards. >>>>>> >>>>>> What are you trying to do? 2 partitions with no replicas, or one >>>> partition with one replica? >>>>>> >>>>>> In either case, I think you will have better luck if you fire up at >>>> least as many servers as the numShards setting. Or lower the numShards >>>> setting. >>>>>> >>>>>> This is all a work in progress by the way - what you are trying to test >>>> should work if things are setup right though. >>>>>> >>>>>> - Mark >>>>>> >>>>>> >>>>>> On Dec 1, 2011, at 10:40 PM, Jamie Johnson wrote: >>>>>> >>>>>>> Thanks for the quick response. With that change (have not done >>>>>>> numShards yet) shard1 got updated. But now when executing the >>>>>>> following queries I get information back from both, which doesn't seem >>>>>>> right >>>>>>> >>>>>>> http://localhost:7574/solr/select/?q=*:* >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated >>>> value</str></doc> >>>>>>> >>>>>>> http://localhost:8983/solr/select?q=*:* >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated >>>> value</str></doc> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller <markrmil...@gmail.com> >>>> wrote: >>>>>>>> Hmm...sorry bout that - so my first guess is that right now we are >>>> not distributing a commit (easy to add, just have not done it). >>>>>>>> >>>>>>>> Right now I explicitly commit on each server for tests. >>>>>>>> >>>>>>>> Can you try explicitly committing on server1 after updating the doc >>>> on server 2? >>>>>>>> >>>>>>>> I can start distributing commits tomorrow - been meaning to do it for >>>> my own convenience anyhow. >>>>>>>> >>>>>>>> Also, you want to pass the sys property numShards=1 on startup. I >>>> think it defaults to 3. That will give you one leader and one replica. >>>>>>>> >>>>>>>> - Mark >>>>>>>> >>>>>>>> On Dec 1, 2011, at 9:56 PM, Jamie Johnson wrote: >>>>>>>> >>>>>>>>> So I couldn't resist, I attempted to do this tonight, I used the >>>>>>>>> solrconfig you mentioned (as is, no modifications), I setup a 2 shard >>>>>>>>> cluster in collection1, I sent 1 doc to 1 of the shards, updated it >>>>>>>>> and sent the update to the other. I don't see the modifications >>>>>>>>> though I only see the original document. The following is the test >>>>>>>>> >>>>>>>>> public void update() throws Exception { >>>>>>>>> >>>>>>>>> String key = "1"; >>>>>>>>> >>>>>>>>> SolrInputDocument solrDoc = new SolrInputDocument(); >>>>>>>>> solrDoc.setField("key", key); >>>>>>>>> >>>>>>>>> solrDoc.addField("content", "initial value"); >>>>>>>>> >>>>>>>>> SolrServer server = servers >>>>>>>>> .get(" >>>> http://localhost:8983/solr/collection1"); >>>>>>>>> server.add(solrDoc); >>>>>>>>> >>>>>>>>> server.commit(); >>>>>>>>> >>>>>>>>> solrDoc = new SolrInputDocument(); >>>>>>>>> solrDoc.addField("key", key); >>>>>>>>> solrDoc.addField("content", "updated value"); >>>>>>>>> >>>>>>>>> server = servers.get(" >>>> http://localhost:7574/solr/collection1"); >>>>>>>>> >>>>>>>>> UpdateRequest ureq = new UpdateRequest(); >>>>>>>>> ureq.setParam("update.chain", "distrib-update-chain"); >>>>>>>>> ureq.add(solrDoc); >>>>>>>>> ureq.setParam("shards", >>>>>>>>> >>>> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >>>>>>>>> ureq.setParam("self", "foo"); >>>>>>>>> ureq.setAction(ACTION.COMMIT, true, true); >>>>>>>>> server.request(ureq); >>>>>>>>> System.out.println("done"); >>>>>>>>> } >>>>>>>>> >>>>>>>>> key is my unique field in schema.xml >>>>>>>>> >>>>>>>>> What am I doing wrong? >>>>>>>>> >>>>>>>>> On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson <jej2...@gmail.com> >>>> wrote: >>>>>>>>>> Yes, the ZK method seems much more flexible. Adding a new shard >>>> would >>>>>>>>>> be simply updating the range assignments in ZK. Where is this >>>>>>>>>> currently on the list of things to accomplish? I don't have time to >>>>>>>>>> work on this now, but if you (or anyone) could provide direction I'd >>>>>>>>>> be willing to work on this when I had spare time. I guess a JIRA >>>>>>>>>> detailing where/how to do this could help. Not sure if the design >>>> has >>>>>>>>>> been thought out that far though. >>>>>>>>>> >>>>>>>>>> On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller <markrmil...@gmail.com> >>>> wrote: >>>>>>>>>>> Right now lets say you have one shard - everything there hashes to >>>> range X. >>>>>>>>>>> >>>>>>>>>>> Now you want to split that shard with an Index Splitter. >>>>>>>>>>> >>>>>>>>>>> You divide range X in two - giving you two ranges - then you start >>>> splitting. This is where the current Splitter needs a little modification. >>>> You decide which doc should go into which new index by rehashing each doc >>>> id in the index you are splitting - if its hash is greater than X/2, it >>>> goes into index1 - if its less, index2. I think there are a couple current >>>> Splitter impls, but one of them does something like, give me an id - now if >>>> the id's in the index are above that id, goto index1, if below, index2. We >>>> need to instead do a quick hash rather than simple id compare. >>>>>>>>>>> >>>>>>>>>>> Why do you need to do this on every shard? >>>>>>>>>>> >>>>>>>>>>> The other part we need that we dont have is to store hash range >>>> assignments in zookeeper - we don't do that yet because it's not needed >>>> yet. Instead we currently just simply calculate that on the fly (too often >>>> at the moment - on every request :) I intend to fix that of course). >>>>>>>>>>> >>>>>>>>>>> At the start, zk would say, for range X, goto this shard. After >>>> the split, it would say, for range less than X/2 goto the old node, for >>>> range greater than X/2 goto the new node. >>>>>>>>>>> >>>>>>>>>>> - Mark >>>>>>>>>>> >>>>>>>>>>> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote: >>>>>>>>>>> >>>>>>>>>>>> hmmm.....This doesn't sound like the hashing algorithm that's on >>>> the >>>>>>>>>>>> branch, right? The algorithm you're mentioning sounds like there >>>> is >>>>>>>>>>>> some logic which is able to tell that a particular range should be >>>>>>>>>>>> distributed between 2 shards instead of 1. So seems like a trade >>>> off >>>>>>>>>>>> between repartitioning the entire index (on every shard) and >>>> having a >>>>>>>>>>>> custom hashing algorithm which is able to handle the situation >>>> where 2 >>>>>>>>>>>> or more shards map to a particular range. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller < >>>> markrmil...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I am not familiar with the index splitter that is in contrib, >>>> but I'll >>>>>>>>>>>>>> take a look at it soon. So the process sounds like it would be >>>> to run >>>>>>>>>>>>>> this on all of the current shards indexes based on the hash >>>> algorithm. >>>>>>>>>>>>> >>>>>>>>>>>>> Not something I've thought deeply about myself yet, but I think >>>> the idea would be to split as many as you felt you needed to. >>>>>>>>>>>>> >>>>>>>>>>>>> If you wanted to keep the full balance always, this would mean >>>> splitting every shard at once, yes. But this depends on how many boxes >>>> (partitions) you are willing/able to add at a time. >>>>>>>>>>>>> >>>>>>>>>>>>> You might just split one index to start - now it's hash range >>>> would be handled by two shards instead of one (if you have 3 replicas per >>>> shard, this would mean adding 3 more boxes). When you needed to expand >>>> again, you would split another index that was still handling its full >>>> starting range. As you grow, once you split every original index, you'd >>>> start again, splitting one of the now half ranges. >>>>>>>>>>>>> >>>>>>>>>>>>>> Is there also an index merger in contrib which could be used to >>>> merge >>>>>>>>>>>>>> indexes? I'm assuming this would be the process? >>>>>>>>>>>>> >>>>>>>>>>>>> You can merge with IndexWriter.addIndexes (Solr also has an >>>> admin command that can do this). But I'm not sure where this fits in? >>>>>>>>>>>>> >>>>>>>>>>>>> - Mark >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller < >>>> markrmil...@gmail.com> wrote: >>>>>>>>>>>>>>> Not yet - we don't plan on working on this until a lot of >>>> other stuff is >>>>>>>>>>>>>>> working solid at this point. But someone else could jump in! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There are a couple ways to go about it that I know of: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> A more long term solution may be to start using micro shards - >>>> each index >>>>>>>>>>>>>>> starts as multiple indexes. This makes it pretty fast to move >>>> mirco shards >>>>>>>>>>>>>>> around as you decide to change partitions. It's also less >>>> flexible as you >>>>>>>>>>>>>>> are limited by the number of micro shards you start with. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> A more simple and likely first step is to use an index >>>> splitter . We >>>>>>>>>>>>>>> already have one in lucene contrib - we would just need to >>>> modify it so >>>>>>>>>>>>>>> that it splits based on the hash of the document id. This is >>>> super >>>>>>>>>>>>>>> flexible, but splitting will obviously take a little while on >>>> a huge index. >>>>>>>>>>>>>>> The current index splitter is a multi pass splitter - good >>>> enough to start >>>>>>>>>>>>>>> with, but most files under codec control these days, we may be >>>> able to make >>>>>>>>>>>>>>> a single pass splitter soon as well. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Eventually you could imagine using both options - micro shards >>>> that could >>>>>>>>>>>>>>> also be split as needed. Though I still wonder if micro shards >>>> will be >>>>>>>>>>>>>>> worth the extra complications myself... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Right now though, the idea is that you should pick a good >>>> number of >>>>>>>>>>>>>>> partitions to start given your expected data ;) Adding more >>>> replicas is >>>>>>>>>>>>>>> trivial though. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson < >>>> jej2...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Another question, is there any support for repartitioning of >>>> the index >>>>>>>>>>>>>>>> if a new shard is added? What is the recommended approach for >>>>>>>>>>>>>>>> handling this? It seemed that the hashing algorithm (and >>>> probably >>>>>>>>>>>>>>>> any) would require the index to be repartitioned should a new >>>> shard be >>>>>>>>>>>>>>>> added. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson < >>>> jej2...@gmail.com> wrote: >>>>>>>>>>>>>>>>> Thanks I will try this first thing in the morning. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller < >>>> markrmil...@gmail.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson < >>>> jej2...@gmail.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am currently looking at the latest solrcloud branch and >>>> was >>>>>>>>>>>>>>>>>>> wondering if there was any documentation on configuring the >>>>>>>>>>>>>>>>>>> DistributedUpdateProcessor? What specifically in >>>> solrconfig.xml needs >>>>>>>>>>>>>>>>>>> to be added/modified to make distributed indexing work? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Jaime - take a look at solrconfig-distrib-update.xml in >>>>>>>>>>>>>>>>>> solr/core/src/test-files >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You need to enable the update log, add an empty replication >>>> handler def, >>>>>>>>>>>>>>>>>> and an update chain with >>>> solr.DistributedUpdateProcessFactory in it. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> http://www.lucidimagination.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> - Mark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://www.lucidimagination.com >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - Mark Miller >>>>>>>>>>>>> lucidimagination.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Mark Miller >>>>>>>>>>> lucidimagination.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> - Mark Miller >>>>>>>> lucidimagination.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> - Mark Miller >>>>>> lucidimagination.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >