This patch is for the xenial version of swift. ** Description changed:
Trusty, Mitaka, Juju 1.25.11 We have a cloud where swift replicators are constantly falling over on 2 nodes. This occurs whenever rsyslog restarts, as in https://bugs.launchpad.net/swift/+bug/1094230 https://review.openstack.org/#/c/24871 https://bugs.python.org/issue15179 rsyslog restarts are unfortunately frequent right now, due to https://bugs.launchpad.net/juju-core/+bug/1683075 Nodes are landscape managed and up to date but still exhibit the failure. Not much from running swift in verbose. https://pastebin.canonical.com/185609/ sosreports are uploading to https://private- fileshare.canonical.com/~jillr/sf00137831/ + + [Impact] + + * Stopping rsyslog causes swift daemons to crash due to overflowing the + call stack when attempting to write an entry to the logging subsystem + and the attempt to write to /dev/log fails. When rsyslog stops, the + /dev/log socket is unavailable and results in an exception. The swift + logging code attempts to log the resultant error, which again results in + an exception. This continues until the stack is overflowed and the swift + daemons crash. When the swift daemons crash, the object, container and + account data are not able to be replicated to other storage nodes in the + system, which affects the data integrity of the data being written to + the system. + + * The patch should be backported to stable releases in order to ensure + that the data integrity of objects, accounts, and containers within + Swift are not adversely affected due to failed logging subsystems. + + * The uploaded patches fix the bug by only attempting to log an entry + to the logging subsystem if the current call stack does not include an + attempt to write to the logs. If the current call stack includes an + attempt to log to the logging subsystem, the log message is dropped + avoiding the recursion. + + [Test Case] + + * Install swift storage cluster + * Log into one of the swift storage nodes + * Ensure the swift-{object,account,container}-replicator processes are running + * Stop the rsyslog service + * Wait a minute + * Observe the swift-{object,account,container}-replicator processes are no longer running + + [Regression Potential] + + * This affects the logging capabilities provided by the Swift code. + Possible regressions could occur in almost any subsystem, since the + logging is universal throughout the code base and could result in lost + log entries in the best regression scenario and possible crashing of + swift daemons in the worst case scenario. The regression potential is + mitigated by the fact that this patch has already been included upstream + for over a year now and no regressions have been reported against this + code since. + + [Other Info] + + * /dev/log is not provided by the rsyslog daemon in Xenial, but this + patch still applies in that any persistent exception encountered when + writing to /dev/log will cause the call stack to overflow and crash the + swift daemons. ** Patch added: "xenial patch" https://bugs.launchpad.net/charm-swift-storage/+bug/1683076/+attachment/4882208/+files/lp1683076-xenial.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1683076 Title: Swift-storage dies if rsyslog is stopped To manage notifications about this bug go to: https://bugs.launchpad.net/charm-swift-storage/+bug/1683076/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs