Looking deeper into zookeeper as truth mode I was wrong about existing replicas being recreated once storage is gone.. Seems there is intent for the type of behavior based upon existing tickets.. We'll look at creating a patch for this too..
Steve On Tue, Jul 5, 2016 at 6:00 PM Tomás Fernández Löbbe <tomasflo...@gmail.com> wrote: > The leader will do the replication before responding to the client, so lets > say the leader gets to update it's local copy, but it's terminated before > sending the request to the replicas, the client should get either an HTTP > 500 or no http response. From the client code you can take action (log, > retry, etc). > The "min_rf" is useful for the case where replicas may be down or not > accessible. Again, you can use this for retrying or take any necessary > action on the client side if the desired rf is not achieved. > > Tomás > > On Tue, Jul 5, 2016 at 11:39 AM, Lorenzo Fundaró < > lorenzo.fund...@dawandamail.com> wrote: > > > @Tomas and @Steven > > > > I am a bit skeptical about this two statements: > > > > If a node just disappears you should be fine in terms of data > > > availability, since Solr in "SolrCloud" replicates the data as it comes > > it > > > (before sending the http response) > > > > > > and > > > > > > > > You shouldn't "need" to move the storage as SolrCloud will replicate > all > > > data to the new node and anything in the transaction log will already > be > > > distributed through the rest of the machines.. > > > > > > because according to the official documentation here > > < > > > https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance > > >: > > (Write side fault tolerant -> recovery) > > > > If a leader goes down, it may have sent requests to some replicas and not > > > others. So when a new potential leader is identified, it runs a synch > > > process against the other replicas. If this is successful, everything > > > should be consistent, the leader registers as active, and normal > actions > > > proceed > > > > > > I think there is a possibility that an update is not sent by the leader > but > > is kept in the local disk and after it comes up again it can sync the > > non-sent data. > > > > Furthermore: > > > > Achieved Replication Factor > > > When using a replication factor greater than one, an update request may > > > succeed on the shard leader but fail on one or more of the replicas. > For > > > instance, consider a collection with one shard and replication factor > of > > > three. In this case, you have a shard leader and two additional > replicas. > > > If an update request succeeds on the leader but fails on both replicas, > > for > > > whatever reason, the update request is still considered successful from > > the > > > perspective of the client. The replicas that missed the update will > sync > > > with the leader when they recover. > > > > > > They have implemented this parameter called *min_rf* that you can use > > (client-side) to make sure that your update was replicated to at least > one > > replica (e.g.: min_rf > 1). > > > > This is why my concern about moving storage around, because then I know > > when the shard leader comes back, solrcloud will run sync process for > those > > documents that couldn't be sent to the replicas. > > > > Am I missing something or misunderstood the documentation ? > > > > Cheers ! > > > > > > > > > > > > > > > > On 5 July 2016 at 19:49, Davis, Daniel (NIH/NLM) [C] < > daniel.da...@nih.gov > > > > > wrote: > > > > > Lorenzo, this probably comes late, but my systems guys just don't want > to > > > give me real disk. Although RAID-5 or LVM on-top of JBOD may be > better > > > than Amazon EBS, Amazon EBS is still much closer to real disk in terms > of > > > IOPS and latency than NFS ;) I even ran a mini test (not an official > > > benchmark), and found the response time for random reads to be better. > > > > > > If you are a young/smallish company, this may be all in the cloud, but > if > > > you are in a large organization like mine, you may also need to allow > for > > > other architectures, such as a "virtual" Netapp in the cloud that > > > communicates with a physical Netapp on-premises, and the > > throughput/latency > > > of that. The most important thing is to actually measure the numbers > > you > > > are getting, both for search and for simply raw I/O, or to get your > > > systems/storage guys to measure those numbers. If you get your > > > systems/storage guys to just measure storage - you will want to care > > about > > > three things for indexing primarily: > > > > > > Sequential Write Throughput > > > Random Read Throughput > > > Random Read Response Time/Latency > > > > > > Hope this helps, > > > > > > Dan Davis, Systems/Applications Architect (Contractor), > > > Office of Computer and Communications Systems, > > > National Library of Medicine, NIH > > > > > > > > > > > > -----Original Message----- > > > From: Lorenzo Fundaró [mailto:lorenzo.fund...@dawandamail.com] > > > Sent: Tuesday, July 05, 2016 3:20 AM > > > To: solr-user@lucene.apache.org > > > Subject: Re: deploy solr on cloud providers > > > > > > Hi Shawn. Actually what im trying to find out is whether this is the > best > > > approach for deploying solr in the cloud. I believe solrcloud solves a > > lot > > > of problems in terms of High Availability but when it comes to storage > > > there seems to be a limitation that can be workaround of course but > it's > > a > > > bit cumbersome and i was wondering if there is a better option for this > > or > > > if im missing something with the way I'm doing it. I wonder if there > are > > > some proved experience about how to solve the storage problem when > > > deploying in the cloud. Any advise or point to some enlightening > > > documentation will be appreciated. Thanks. > > > On Jul 4, 2016 18:27, "Shawn Heisey" <apa...@elyograg.org> wrote: > > > > > > > On 7/4/2016 10:18 AM, Lorenzo Fundaró wrote: > > > > > when deploying solr (in solrcloud mode) in the cloud one has to > take > > > > > care of storage, and as far as I understand it can be a problem > > > > > because the storage should go wherever the node is created. If we > > > > > have for example, a node on EC2 with its own persistent disk, this > > > > > node happens to be the leader and at some point crashes but > couldn't > > > > > make the replication of the data that has in the transaction log, > > > > > how do we do in that case ? Ideally the new node must use the > > > > > leftover data that the death node left, but this is a bit > cumbersome > > > > > in my opinion. What are the best practices for this ? > > > > > > > > I can't make any sense of this. What is the *exact* problem you need > > > > to solve? The details can be very important. > > > > > > > > We might be dealing with this: > > > > > > > > http://people.apache.org/~hossman/#xyproblem > > > > > > > > Thanks, > > > > Shawn > > > > > > > > > > > > > > > > > > > -- > > > > -- > > Lorenzo Fundaro > > Backend Engineer > > E-Mail: lorenzo.fund...@dawandamail.com > > > > Fax + 49 - (0)30 - 25 76 08 52 > > Tel + 49 - (0)179 - 51 10 982 > > > > DaWanda GmbH > > Windscheidstraße 18 > > 10627 Berlin > > > > Geschäftsführer: Claudia Helming und Niels Nüssler > > AG Charlottenburg HRB 104695 B http://www.dawanda.com > > >