Re: Solr performance on EC2 linux

Jeff Wartes Mon, 01 May 2017 21:22:58 -0700

Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS.

On 5/1/17, 7:22 PM, "Will Martin" <[email protected]> wrote:


    Ubuntu 16.04 LTS - Xenial (HVM)
    
    Is this your Xenial version?
    
    
    
    
    On 5/1/2017 6:37 PM, Jeff Wartes wrote:
    > I tried a few variations of various things before we found and tried that 
linux/EC2 tuning page, including:
    >    - EC2 instance type: r4, c4, and i3
    >    - Ubuntu version: Xenial and Trusty
    >    - EBS vs local storage
    >    - Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m 
aware of the issues with early java8 versions and I’m not using G1)
    >
    > Most of those attempts were to help reduce differences between the data 
center and the EC2 cluster. In all cases I re-indexed from scratch. I got the 
same very high system-time symptom in all cases. With the linux changes in 
place, we settled on r4/Xenial/EBS/Stock.
    >
    > Again, this was a slightly modified Solr 5.4, (I added backup requests, 
and two memory allocation rate tweaks that have long since been merged into 
mainline - released in 6.2 I think. I can dig up the jira numbers if anyone’s 
interested) I’ve never used Solr 6.x in production though.
    > The only reason I mentioned 6.x at all is because I’m aware that ES 5.x 
is based on Lucene 6.2. I don’t believe my coworker spent any time on tuning 
his ES setup, although I think he did try G1.
    >
    > I definitely do want to binary-search those settings until I understand 
better what exactly did the trick.
    > It’s a long cycle time per test is the problem, but hopefully in the next 
couple of weeks.
    >
    >
    >
    > On 5/1/17, 7:26 AM, "John Bickerstaff" <[email protected]> wrote:
    >
    >      It's also very important to consider the type of EC2 instance you are
    >      using...
    >      
    >      We settled on the R4.2XL...  The R series is labeled "High-Memory"
    >      
    >      Which instance type did you end up using?
    >      
    >      On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <[email protected]> 
wrote:
    >      
    >      > On 4/28/2017 10:09 AM, Jeff Wartes wrote:
    >      > > tldr: Recently, I tried moving an existing solrcloud 
configuration from
    >      > a local datacenter to EC2. Performance was roughly 1/10th what I’d
    >      > expected, until I applied a bunch of linux tweaks.
    >      >
    >      > How very strange.  I knew virtualization would have overheard, 
possibly
    >      > even measurable overhead, but that's insane.  Running on bare 
metal is
    >      > always better if you can do it.  I would be curious what would 
happen on
    >      > your original install if you applied similar tuning to that.  
Would you
    >      > see a speedup there?
    >      >
    >      > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, 
so a
    >      > much more recent release) alternate implementation of the same 
index was
    >      > not seeing this high-system-time behavior on EC2, and was getting
    >      > throughput consistent with our general expectations.
    >      >
    >      > That's even weirder.  ES 5.x will likely be using Points field 
types for
    >      > numeric fields, and although those are faster than what Solr 
currently
    >      > uses, I doubt it could explain that difference.  The implication 
here is
    >      > that the ES systems are running with stock EC2 settings, not the 
tuned
    >      > settings ... but I'd like you to confirm that.  Same Java version 
as
    >      > with Solr?  IMHO, Java itself is more likely to cause issues like 
you
    >      > saw than Solr.
    >      >
    >      > > I’m writing this for a few reasons:
    >      > >
    >      > > 1.       The performance difference was so crazy I really feel 
like this
    >      > should really be broader knowledge.
    >      >
    >      > Definitely agree!  I would be very interested in learning which of 
the
    >      > tunables you changed were major contributors to the improvement.  
If it
    >      > turns out that Solr's code is sub-optimal in some way, maybe we 
can fix it.
    >      >
    >      > > 2.       If anyone is aware of anything that changed in Lucene 
between
    >      > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering 
from
    >      > this? If it’s the clocksource that’s the issue, there’s an 
implication that
    >      > Solr was using tons more system calls like gettimeofday that the 
EC2 (xen)
    >      > hypervisor doesn’t allow in userspace.
    >      >
    >      > I had not considered the performance regression in 6.4.0 and 6.4.1 
that
    >      > Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x 
version?
    >      >
    >      > =============
    >      >
    >      > Specific thoughts on the tuning:
    >      >
    >      > The noatime option is very good to use.  I also use nodiratime on 
my
    >      > systems.  Turning these off can have *massive* impacts on disk
    >      > performance.  If these are the source of the speedup, then the 
machine
    >      > doesn't have enough spare memory.
    >      >
    >      > I'd be wary of the "nobarrier" mount option.  If the underlying 
storage
    >      > has battery-backed write caches, or is SSD without write caching, 
it
    >      > wouldn't be a problem.  Here's info about the "discard" mount 
option, I
    >      > don't know whether it applies to your amazon storage:
    >      >
    >      >        discard/nodiscard
    >      >               Controls  whether ext4 should issue discard/TRIM 
commands
    >      > to the
    >      >               underlying block device when blocks are freed.  This 
 is
    >      > useful
    >      >               for  SSD  devices  and sparse/thinly-provisioned 
LUNs, but
    >      > it is
    >      >               off by default until sufficient testing has been 
done.
    >      >
    >      > The network tunables would have more of an effect in a distributed
    >      > environment like EC2 than they would on a LAN.
    >      >
    >      > Thanks,
    >      > Shawn
    >      >
    >      >
    >      
    >

Re: Solr performance on EC2 linux

Reply via email to