This patch is for the xenial version of swift.

** Description changed:

  Trusty, Mitaka, Juju 1.25.11
  
  We have a cloud where swift replicators are constantly falling over on 2
  nodes.  This occurs whenever rsyslog restarts, as in
  
  https://bugs.launchpad.net/swift/+bug/1094230
  https://review.openstack.org/#/c/24871
  https://bugs.python.org/issue15179
  
  rsyslog restarts are unfortunately frequent right now, due to
  https://bugs.launchpad.net/juju-core/+bug/1683075
  
  Nodes are landscape managed and up to date but still exhibit the
  failure.
  
  Not much from running swift in verbose.
  https://pastebin.canonical.com/185609/
  
  sosreports are uploading to https://private-
  fileshare.canonical.com/~jillr/sf00137831/
+ 
+ [Impact]
+ 
+  * Stopping rsyslog causes swift daemons to crash due to overflowing the
+ call stack when attempting to write an entry to the logging subsystem
+ and the attempt to write to /dev/log fails. When rsyslog stops, the
+ /dev/log socket is unavailable and results in an exception. The swift
+ logging code attempts to log the resultant error, which again results in
+ an exception. This continues until the stack is overflowed and the swift
+ daemons crash. When the swift daemons crash, the object, container and
+ account data are not able to be replicated to other storage nodes in the
+ system, which affects the data integrity of the data being written to
+ the system.
+ 
+  * The patch should be backported to stable releases in order to ensure
+ that the data integrity of objects, accounts, and containers within
+ Swift are not adversely affected due to failed logging subsystems.
+ 
+  * The uploaded patches fix the bug by only attempting to log an entry
+ to the logging subsystem if the current call stack does not include an
+ attempt to write to the logs. If the current call stack includes an
+ attempt to log to the logging subsystem, the log message is dropped
+ avoiding the recursion.
+ 
+ [Test Case]
+ 
+  * Install swift storage cluster
+  * Log into one of the swift storage nodes
+  * Ensure the swift-{object,account,container}-replicator processes are 
running
+  * Stop the rsyslog service
+  * Wait a minute
+  * Observe the swift-{object,account,container}-replicator processes are no 
longer running
+ 
+ [Regression Potential]
+ 
+  * This affects the logging capabilities provided by the Swift code.
+ Possible regressions could occur in almost any subsystem, since the
+ logging is universal throughout the code base and could result in lost
+ log entries in the best regression scenario and possible crashing of
+ swift daemons in the worst case scenario. The regression potential is
+ mitigated by the fact that this patch has already been included upstream
+ for over a year now and no regressions have been reported against this
+ code since.
+ 
+ [Other Info]
+ 
+  * /dev/log is not provided by the rsyslog daemon in Xenial, but this
+ patch still applies in that any persistent exception encountered when
+ writing to /dev/log will cause the call stack to overflow and crash the
+ swift daemons.

** Patch added: "xenial patch"
   
https://bugs.launchpad.net/charm-swift-storage/+bug/1683076/+attachment/4882208/+files/lp1683076-xenial.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1683076

Title:
  Swift-storage dies if rsyslog is stopped

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-swift-storage/+bug/1683076/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to