Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS.
On 5/1/17, 7:22 PM, "Will Martin" <[email protected]> wrote:
Ubuntu 16.04 LTS - Xenial (HVM)
Is this your Xenial version?
On 5/1/2017 6:37 PM, Jeff Wartes wrote:
> I tried a few variations of various things before we found and tried that
linux/EC2 tuning page, including:
> - EC2 instance type: r4, c4, and i3
> - Ubuntu version: Xenial and Trusty
> - EBS vs local storage
> - Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m
aware of the issues with early java8 versions and I’m not using G1)
>
> Most of those attempts were to help reduce differences between the data
center and the EC2 cluster. In all cases I re-indexed from scratch. I got the
same very high system-time symptom in all cases. With the linux changes in
place, we settled on r4/Xenial/EBS/Stock.
>
> Again, this was a slightly modified Solr 5.4, (I added backup requests,
and two memory allocation rate tweaks that have long since been merged into
mainline - released in 6.2 I think. I can dig up the jira numbers if anyone’s
interested) I’ve never used Solr 6.x in production though.
> The only reason I mentioned 6.x at all is because I’m aware that ES 5.x
is based on Lucene 6.2. I don’t believe my coworker spent any time on tuning
his ES setup, although I think he did try G1.
>
> I definitely do want to binary-search those settings until I understand
better what exactly did the trick.
> It’s a long cycle time per test is the problem, but hopefully in the next
couple of weeks.
>
>
>
> On 5/1/17, 7:26 AM, "John Bickerstaff" <[email protected]> wrote:
>
> It's also very important to consider the type of EC2 instance you are
> using...
>
> We settled on the R4.2XL... The R series is labeled "High-Memory"
>
> Which instance type did you end up using?
>
> On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <[email protected]>
wrote:
>
> > On 4/28/2017 10:09 AM, Jeff Wartes wrote:
> > > tldr: Recently, I tried moving an existing solrcloud
configuration from
> > a local datacenter to EC2. Performance was roughly 1/10th what I’d
> > expected, until I applied a bunch of linux tweaks.
> >
> > How very strange. I knew virtualization would have overheard,
possibly
> > even measurable overhead, but that's insane. Running on bare
metal is
> > always better if you can do it. I would be curious what would
happen on
> > your original install if you applied similar tuning to that.
Would you
> > see a speedup there?
> >
> > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x,
so a
> > much more recent release) alternate implementation of the same
index was
> > not seeing this high-system-time behavior on EC2, and was getting
> > throughput consistent with our general expectations.
> >
> > That's even weirder. ES 5.x will likely be using Points field
types for
> > numeric fields, and although those are faster than what Solr
currently
> > uses, I doubt it could explain that difference. The implication
here is
> > that the ES systems are running with stock EC2 settings, not the
tuned
> > settings ... but I'd like you to confirm that. Same Java version
as
> > with Solr? IMHO, Java itself is more likely to cause issues like
you
> > saw than Solr.
> >
> > > I’m writing this for a few reasons:
> > >
> > > 1. The performance difference was so crazy I really feel
like this
> > should really be broader knowledge.
> >
> > Definitely agree! I would be very interested in learning which of
the
> > tunables you changed were major contributors to the improvement.
If it
> > turns out that Solr's code is sub-optimal in some way, maybe we
can fix it.
> >
> > > 2. If anyone is aware of anything that changed in Lucene
between
> > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering
from
> > this? If it’s the clocksource that’s the issue, there’s an
implication that
> > Solr was using tons more system calls like gettimeofday that the
EC2 (xen)
> > hypervisor doesn’t allow in userspace.
> >
> > I had not considered the performance regression in 6.4.0 and 6.4.1
that
> > Erick mentioned. Were you still running Solr 5.4, or was it a 6.x
version?
> >
> > =============
> >
> > Specific thoughts on the tuning:
> >
> > The noatime option is very good to use. I also use nodiratime on
my
> > systems. Turning these off can have *massive* impacts on disk
> > performance. If these are the source of the speedup, then the
machine
> > doesn't have enough spare memory.
> >
> > I'd be wary of the "nobarrier" mount option. If the underlying
storage
> > has battery-backed write caches, or is SSD without write caching,
it
> > wouldn't be a problem. Here's info about the "discard" mount
option, I
> > don't know whether it applies to your amazon storage:
> >
> > discard/nodiscard
> > Controls whether ext4 should issue discard/TRIM
commands
> > to the
> > underlying block device when blocks are freed. This
is
> > useful
> > for SSD devices and sparse/thinly-provisioned
LUNs, but
> > it is
> > off by default until sufficient testing has been
done.
> >
> > The network tunables would have more of an effect in a distributed
> > environment like EC2 than they would on a LAN.
> >
> > Thanks,
> > Shawn
> >
> >
>
>