On 7/23/2025 7:30 PM, Raag Jadav wrote:
On Tue, Jul 15, 2025 at 04:17:24PM +0530, Riana Tauro wrote:
The patches in these series refactor the boot survivability code to
allow adding runtime survivability
Refactor existing code to separate both the modes

Punctuations please!

This patch renames the functions and separates init and enable

...

  static ssize_t survivability_mode_show(struct device *dev,
                                       struct device_attribute *attr, char 
*buff)
  {
@@ -130,6 +138,11 @@ static ssize_t survivability_mode_show(struct device *dev,
        struct xe_survivability_info *info = survivability->info;
        int index = 0, count = 0;
+ count += sysfs_emit_at(buff, count, "Survivability mode type: Boot\n");

Although I'm okay with this but, should we make it something more parseable
from userspace?

Suggestions?

All the rest of the information is also in <name>:<value> pairs.
Dumping scratch registers is not useful for runtime survivability so added a line instead of an empty file


+       if (!check_boot_failure(xe))
+               return count;
+

...

+int xe_survivability_mode_boot_enable(struct xe_device *xe)
  {
        struct xe_survivability *survivability = &xe->survivability;
-       struct xe_survivability_info *info;
        struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+       int ret;
if (!xe_survivability_mode_is_requested(xe))
                return 0;
- survivability->size = MAX_SCRATCH_MMIO;
-
-       info = devm_kcalloc(xe->drm.dev, survivability->size, sizeof(*info),
-                           GFP_KERNEL);
-       if (!info)
-               return -ENOMEM;
-
-       survivability->info = info;
-
-       populate_survivability_info(xe);
+       ret = init_survivability_mode(xe);
+       if (ret)
+               return ret;
- /* Only log debug information and exit if it is a critical failure */
+       /* Log breadcrumbs but do not enter survivability mode for Critical 
boot errors */
        if (survivability->boot_status == CRITICAL_FAILURE) {
                log_survivability_info(pdev);

I'm not much informed about the history here, but should we be logging the
scratchs if we consider them sensitive?

For non-critical, survivability mode is enabled and a firmware flash can be triggered to recover. For critical, the scratch registers are dumped for more information about failure since there is no sysfs. It would be useful to admin to find more information about failure

Thanks
Riana


Raag



Reply via email to