darinspivey commented on PR #668: URL: https://github.com/apache/pulsar-helm-chart/pull/668#issuecomment-4110748154
@lhotari thanks for your quick attention here. I'd be happy to provide more visibility as I learn more about Pulsar--this is my first production implementation of it, and we're still in the tweaking stage, but loving it so far (as compared to kafka). Here are the versions I'm currently using: ``` pulsar-helm-chart: 4.5.0 pulsar-client: ^1.16.0 (nodejs publishing tier) pulsar-rs: 6.7.1 (consumption tier) ``` I might have generalized too much about the 'bad state' of the system. What I've generally seen is the fire-and-forget upgrade where multiple components restart at the same time. When that happens, it's more of a stampede problem when you have thousands of topics and busy producers. I've seen it blow up ZK with a flood of lookups, brokers crashing because they can't handle bundle handoffs, and heat on the bookies when the ensembles lose member nodes. All of that together just makes for a bad situation--so far, I've just been able to turn off Pulsar to recover from these situations, as I'm still tuning production on a trial basis. > In general, it would be useful to perform upgrades "slowly" so that each set of components is handled separately and upgraded before moving on to the next ones. This PR here is a crude attempt at doing this, but perhaps a more elegant way of rolling the components out is warranted. I agree that the 'slow upgrade' approach feels better in terms of control, and you've offered up a few good suggestions in that area. >Even without handling restarts separately, it shouldn't result in the cluster getting into a bad error state unless the high load causes the system to collapse when there are a lot of component restarts at once. Yes, this is it, primarily. ZK might also a preferred restart order based on the leader, but I think that's a minor thing compared to the traffic stampedes that happen. > This is just one thought on some solutions. It would be great if you could contribute a section to the README.md file about handling upgrades in a controlled way and what problems it resolves. I'm happy to contribute anything I can as I learn more about the system's behavior through my own testing. Would you prefer such a contribution prior to any official upgrade controls? It can be just what's worked for me based on this PRs change? > One known issue with brokers in a full rolling restart is that there's also a lot of shuffling due to load balancing. Yes, 100% true. At first, I wasn't able to roll brokers without experiencing super high latency on my publishing tier, which was disasterous. I was able to mitigate this situation by doing a few things: * First, setting `terminationGracePeriodSeconds: 300` which allows enough time for all topics to unload prior to being killed by k8s. I've found that the topics usually unload fine in under a minute, which is only slightly longer that the k8s default of (I think) 30s. * Also, a big win was to set the liveness probe to `initialDelaySeconds: 30` with the thought being "let the last broker fully come up before the next one rolls--that way the newest broker can be the most appropriate candidate to handle the next broker's offloaded topics." * I had to also make sure that my publishers had ample `operationTimeoutSeconds` of 20-30. Initially, I had this set low in a "fail fast" mentality, only to learn that it caused more of a flood to the proxies and to ZK. Taking the bundle move hit (latency) up front allowed for a quicker recovery when the bundles settle into their new broker. For the record, I've found that MOST of the pulsar defaults have been pretty close to production-ready. The default setting for this (30 seconds?) is another example of that--I shouldn't have changed it, lol. I'm not sure which (or all) of these helped most, but that fixed my stampede problems with rolling brokers. The last broker to roll is mostly idle when the rolling is done, but then the `ThresholdShedder` takes care of balancing it out smoothly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
