Public bug reported:
Hi,
[Impact]
Currently in focal, mlx5 devices reporter recovery is enabled even if state is
healthy.
[test case]
1)
display devlink health status
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
reporter fw_fatal
state healthy error 0 recover 0 grace_period 1200000 auto_recover true
2)
perform reporter recovery using devlink,
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
3)see that recovery was performed.
# dmesg
[776733.438708] mlx5_core 0000:05:00.0: mlx5_health_try_recover:316:(pid
563178): handling bad device here
[776733.438717] mlx5_core 0000:05:00.0: mlx5_handle_bad_state:278:(pid 563178):
Expected to see disabled
NIC but it is full driver
[776735.591522] mlx5_core 0000:05:00.0: mlx5_health_try_recover:328:(pid
563178): starting health recovery flow
...
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
reporter fw_fatal
state healthy error 0 recover 1 grace_period 1200000 auto_recover true
[fix]
402818205c9e devlink: don't do reporter recovery if the state is healthy
this upstream commit from kernel v5.5-rc1 which is cleanly applied on focal
tree.
the commit prevents reporter recovery when device in healthy state.
when applied, issuing
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
on healthy state reporter return successfully, but dmesg is clean and recover
counter do not change.
[Regression Potential]
very small as it is a very minor change.
Thanks,
Amir
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Description changed:
Hi,
[Impact]
Currently in focal, mlx5 devices reporter recovery is enabled even if state
is healthy.
[test case]
1)
display devlink health status
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
- reporter fw_fatal
- state healthy error 0 recover 0 grace_period 1200000 auto_recover true
+ reporter fw_fatal
+ state healthy error 0 recover 0 grace_period 1200000 auto_recover true
2)
perform reporter recovery using devlink,
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
3)see that recovery was performed.
# dmesg
[776733.438708] mlx5_core 0000:05:00.0: mlx5_health_try_recover:316:(pid
563178): handling bad device here
[776733.438717] mlx5_core 0000:05:00.0: mlx5_handle_bad_state:278:(pid
563178): Expected to see disabled
- NIC but it is full driver
+ NIC but it is full driver
[776735.591522] mlx5_core 0000:05:00.0: mlx5_health_try_recover:328:(pid
563178): starting health recovery flow
...
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
- reporter fw_fatal
- state healthy error 0 recover 1 grace_period 1200000 auto_recover true
+ reporter fw_fatal
+ state healthy error 0 recover 1 grace_period 1200000 auto_recover true
[fix]
402818205c9e devlink: don't do reporter recovery if the state is healthy
this upstream commit from kernel v5.5-rc1 which is cleanly applied on focal
tree.
the commit prevents reporter recovery when device in healthy state.
- when applied, issuing
- # devlink health recover pci/0000:05:00.0 reporter fw_fatal
- on healthy state reporter return successfully, but dmesg is clean and recover
counter do not change.
+ when applied, issuing
+ # devlink health recover pci/0000:05:00.0 reporter fw_fatal
+ on healthy state reporter return successfully, but dmesg is clean and recover
counter do not change.
+
+ [Regression Potential]
+ very small as it is a very minor change.
Thanks,
Amir
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1915403
Title:
devlink: don't do reporter recovery if the state is healthy
Status in linux package in Ubuntu:
New
Bug description:
Hi,
[Impact]
Currently in focal, mlx5 devices reporter recovery is enabled even if state
is healthy.
[test case]
1)
display devlink health status
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
reporter fw_fatal
state healthy error 0 recover 0 grace_period 1200000 auto_recover true
2)
perform reporter recovery using devlink,
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
3)see that recovery was performed.
# dmesg
[776733.438708] mlx5_core 0000:05:00.0: mlx5_health_try_recover:316:(pid
563178): handling bad device here
[776733.438717] mlx5_core 0000:05:00.0: mlx5_handle_bad_state:278:(pid
563178): Expected to see disabled
NIC but it is full driver
[776735.591522] mlx5_core 0000:05:00.0: mlx5_health_try_recover:328:(pid
563178): starting health recovery flow
...
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
reporter fw_fatal
state healthy error 0 recover 1 grace_period 1200000 auto_recover true
[fix]
402818205c9e devlink: don't do reporter recovery if the state is healthy
this upstream commit from kernel v5.5-rc1 which is cleanly applied on focal
tree.
the commit prevents reporter recovery when device in healthy state.
when applied, issuing
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
on healthy state reporter return successfully, but dmesg is clean and recover
counter do not change.
[Regression Potential]
very small as it is a very minor change.
Thanks,
Amir
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1915403/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp