darinspivey commented on PR #668:
URL: 
https://github.com/apache/pulsar-helm-chart/pull/668#issuecomment-4110748154

   @lhotari thanks for your quick attention here.  I'd be happy to provide more 
visibility as I learn more about Pulsar--this is my first production 
implementation of it, and we're still in the tweaking stage, but loving it so 
far (as compared to kafka).  Here are the versions I'm currently using:
   ```
   pulsar-helm-chart: 4.5.0
   pulsar-client: ^1.16.0 (nodejs publishing tier)
   pulsar-rs: 6.7.1 (consumption tier)
   ```
   
   I might have generalized too much about the 'bad state' of the system.  What 
I've generally seen is the fire-and-forget upgrade where multiple components 
restart at the same time.  When that happens, it's more of a stampede problem 
when you have thousands of topics and busy producers.  I've seen it blow up ZK 
with a flood of lookups, brokers crashing because they can't handle bundle 
handoffs, and heat on the bookies when the ensembles lose member nodes.  All of 
that together just makes for a bad situation--so far, I've just been able to 
turn off Pulsar to recover from these situations, as I'm still tuning 
production on a trial basis.
   
   > In general, it would be useful to perform upgrades "slowly" so that each 
set of components is handled separately and upgraded before moving on to the 
next ones.
   
   This PR here is a crude attempt at doing this, but perhaps a more elegant 
way of rolling the components out is warranted.  I agree that the 'slow 
upgrade' approach feels better in terms of control, and you've offered up a few 
good suggestions in that area.
   
   >Even without handling restarts separately, it shouldn't result in the 
cluster getting into a bad error state unless the high load causes the system 
to collapse when there are a lot of component restarts at once.
   
   Yes, this is it, primarily.  ZK might also a preferred restart order based 
on the leader, but I think that's a minor thing compared to the traffic 
stampedes that happen.
   
   > This is just one thought on some solutions. It would be great if you could 
contribute a section to the README.md file about handling upgrades in a 
controlled way and what problems it resolves.
   
   I'm happy to contribute anything I can as I learn more about the system's 
behavior through my own testing.  Would you prefer such a contribution prior to 
any official upgrade controls?  It can be just what's worked for me based on 
this PRs change?  
   
   > One known issue with brokers in a full rolling restart is that there's 
also a lot of shuffling due to load balancing.
   
   Yes, 100% true.  At first, I wasn't able to roll brokers without 
experiencing super high latency on my publishing tier, which was disasterous.  
I was able to mitigate this situation by doing a few things:
   
   * First, setting `terminationGracePeriodSeconds: 300` which allows enough 
time for all topics to unload prior to being killed by k8s.  I've found that 
the topics usually unload fine in under a minute, which is only slightly longer 
that the k8s default of (I think) 30s.
   
   * Also, a big win was to set the liveness probe to `initialDelaySeconds: 30` 
with the thought being "let the last broker fully come up before the next one 
rolls--that way the newest broker can be the most appropriate candidate to 
handle the next broker's offloaded topics."
   
   * I had to also make sure that my publishers had ample 
`operationTimeoutSeconds` of 20-30.  Initially, I had this set low in a "fail 
fast" mentality, only to learn that it caused more of a flood to the proxies 
and to ZK.  Taking the bundle move hit (latency) up front allowed for a quicker 
recovery when the bundles settle into their new broker.  For the record, I've 
found that MOST of the pulsar defaults have been pretty close to 
production-ready.  The default setting for this (30 seconds?) is another 
example of that--I shouldn't have changed it, lol.
   
   I'm not sure which (or all) of these helped most, but that fixed my stampede 
problems with rolling brokers.  The last broker to roll is mostly idle when the 
rolling is done, but then the `ThresholdShedder` takes care of balancing it out 
smoothly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to