A patched kernel (that I created during the SRU preparation and test compile) is available for further testing here: https://people.canonical.com/~fheimes/lp1887774/
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1887774 Title: [UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed ERP Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: New Status in linux source package in Focal: New Status in linux source package in Groovy: New Bug description: Description: zfcp: Fix panic on ERP timeout for previously dismissed ERP Symptom: Linux kernel panic due to kernel page fault in IRQ context when running zfcp_erp_timeout_handler() calling zfcp_erp_notify(). Problem: Suppose that, for unrelated reasons, FSF requests on behalf of recovery are very slow and can run into the ERP timeout. In the case at hand, we did adapter recovery to a large degree. However due to the slowness a LUN open is pending so the corresponding fc_rport remains blocked. After fast_io_fail_tmo we trigger close physical port recovery for the port under which the LUN should have been opened. The new higher order port recovery dismisses the pending LUN open ERP action and dismisses the pending LUN open FSF request. Such dismissal decouples the ERP action from the pending corresponding FSF request by setting zfcp_fsf_req->erp_action to NULL (among other things) [zfcp_erp_strategy_check_fsfreq()]. If now the ERP timeout for the pending open LUN request runs out, we must not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use timer_setup()"). Before that we intentionally only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Note: The lifetime of the corresponding zfcp_fsf_req object continues until a (late) response or an (unrelated) adapter recovery. Solution: Just like the regular response path ignores dismissed requests [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the ERP timeout handler now needs to ignore dismissed requests. So simply return early in the ERP timeout handler if the FSF request is marked as dismissed in its status flags. To protect against the race where zfcp_erp_strategy_check_fsfreq() dismisses and sets zfcp_fsf_req->erp_action to NULL after our previous status flag check, return early if zfcp_fsf_req->erp_action is NULL. After all, the former ERP action does not need to be woken up as that was already done as part of the dismissal above [zfcp_erp_action_dismiss()]. Upstream-ID: 936e6b85da0476dd2edac7c51c68072da9fb4ba2 -> kernel 5.8 Will be integrated by kernel 5.8 by groovy. Please check that this also be integrated into 20.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887774/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp