Lorenzo, this probably comes late, but my systems guys just don't want to give 
me real disk.   Although RAID-5 or LVM on-top of JBOD may be better than Amazon 
EBS, Amazon EBS is still much closer to real disk in terms of IOPS and latency 
than NFS ;)    I even ran a mini test (not an official benchmark), and found 
the response time for random reads to be better.

If you are a young/smallish company, this may be all in the cloud, but if you 
are in a large organization like mine, you may also need to allow for other 
architectures, such as a "virtual" Netapp in the cloud that communicates with a 
physical Netapp on-premises, and the throughput/latency of that.   The most 
important thing is to actually measure the numbers you are getting, both for 
search and for simply raw I/O, or to get your systems/storage guys to measure 
those numbers.     If you get your systems/storage guys to just measure storage 
- you will want to care about three things for indexing primarily:

        Sequential Write Throughput
        Random Read Throughput
        Random Read Response Time/Latency

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH



-----Original Message-----
From: Lorenzo Fundaró [mailto:lorenzo.fund...@dawandamail.com] 
Sent: Tuesday, July 05, 2016 3:20 AM
To: solr-user@lucene.apache.org
Subject: Re: deploy solr on cloud providers

Hi Shawn. Actually what im trying to find out is whether this is the best 
approach for deploying solr in the cloud. I believe solrcloud solves a lot of 
problems in terms of High Availability but when it comes to storage there seems 
to be a limitation that can be workaround of course but it's a bit cumbersome 
and i was wondering if there is a better option for this or if im missing 
something with the way I'm doing it. I wonder if there are some proved 
experience about how to solve the storage problem when deploying in the cloud. 
Any advise or point to some enlightening documentation will be appreciated. 
Thanks.
On Jul 4, 2016 18:27, "Shawn Heisey" <apa...@elyograg.org> wrote:

> On 7/4/2016 10:18 AM, Lorenzo Fundaró wrote:
> > when deploying solr (in solrcloud mode) in the cloud one has to take 
> > care of storage, and as far as I understand it can be a problem 
> > because the storage should go wherever the node is created. If we 
> > have for example, a node on EC2 with its own persistent disk, this 
> > node happens to be the leader and at some point crashes but couldn't 
> > make the replication of the data that has in the transaction log, 
> > how do we do in that case ? Ideally the new node must use the 
> > leftover data that the death node left, but this is a bit cumbersome 
> > in my opinion. What are the best practices for this ?
>
> I can't make any sense of this.  What is the *exact* problem you need 
> to solve?  The details can be very important.
>
> We might be dealing with this:
>
> http://people.apache.org/~hossman/#xyproblem
>
> Thanks,
> Shawn
>
>

Reply via email to