Lorenzo, this probably comes late, but my systems guys just don't want to give me real disk. Although RAID-5 or LVM on-top of JBOD may be better than Amazon EBS, Amazon EBS is still much closer to real disk in terms of IOPS and latency than NFS ;) I even ran a mini test (not an official benchmark), and found the response time for random reads to be better.
If you are a young/smallish company, this may be all in the cloud, but if you are in a large organization like mine, you may also need to allow for other architectures, such as a "virtual" Netapp in the cloud that communicates with a physical Netapp on-premises, and the throughput/latency of that. The most important thing is to actually measure the numbers you are getting, both for search and for simply raw I/O, or to get your systems/storage guys to measure those numbers. If you get your systems/storage guys to just measure storage - you will want to care about three things for indexing primarily: Sequential Write Throughput Random Read Throughput Random Read Response Time/Latency Hope this helps, Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH -----Original Message----- From: Lorenzo Fundaró [mailto:lorenzo.fund...@dawandamail.com] Sent: Tuesday, July 05, 2016 3:20 AM To: solr-user@lucene.apache.org Subject: Re: deploy solr on cloud providers Hi Shawn. Actually what im trying to find out is whether this is the best approach for deploying solr in the cloud. I believe solrcloud solves a lot of problems in terms of High Availability but when it comes to storage there seems to be a limitation that can be workaround of course but it's a bit cumbersome and i was wondering if there is a better option for this or if im missing something with the way I'm doing it. I wonder if there are some proved experience about how to solve the storage problem when deploying in the cloud. Any advise or point to some enlightening documentation will be appreciated. Thanks. On Jul 4, 2016 18:27, "Shawn Heisey" <apa...@elyograg.org> wrote: > On 7/4/2016 10:18 AM, Lorenzo Fundaró wrote: > > when deploying solr (in solrcloud mode) in the cloud one has to take > > care of storage, and as far as I understand it can be a problem > > because the storage should go wherever the node is created. If we > > have for example, a node on EC2 with its own persistent disk, this > > node happens to be the leader and at some point crashes but couldn't > > make the replication of the data that has in the transaction log, > > how do we do in that case ? Ideally the new node must use the > > leftover data that the death node left, but this is a bit cumbersome > > in my opinion. What are the best practices for this ? > > I can't make any sense of this. What is the *exact* problem you need > to solve? The details can be very important. > > We might be dealing with this: > > http://people.apache.org/~hossman/#xyproblem > > Thanks, > Shawn > >