Hi Cristian,

Q: So let me get it straight, so you mean that the actual problem would be the 
ACPI from Linux side, as in he cannot interpret (handle) the specified region 
provided by the PCI Serial Adapter’s chip (because it’s an old chip)?

A: Yes on the ACPI side (Linux),   and the Serial Adapter chip is simply 
something that "cannot sleep", e.g. do lots of ACPI magic.
( versus a modern Network Adapter, FC Adapter, etc. )

Older kernels, with an external ACPI driver, used "SPMI" region of the ACPI 
tables.
Which, may or may not expose certain capabilities like "ability to sleep".
Old kernel might also encounter the "AE_NOT_EXIST", if a device doesn't support 
ACPI magic.

New kernel (with ACPI module) use the "IPMI" region of the ACPI tables.
And, the presentation of the capabilities might be different when done this way.
Again, kernel might encounter the "AE_NOT_EXIST", if a device doesn't support 
ACPI magic.


Q: I would definetly disable that PCI Serial Adapter if I knew how to.

A: First stop, BIOS.  I can't be concrete but there are often 1/2 options that 
might allow disablement of "COM/Serial"

Second, udev.
I won't go into the details for udev, it varies a bit between distros.

Here is an older EL6 sample, for disabling a PCI Sound Card by PCI Major Minor 
Address:

```
cat 100-disable-audio.rules
ACTION=="add", SUBSYSTEM=="sound", KERNEL=="controlC1", RUN+="/bin/sh -c 'echo 
1 > /sys/bus/pci/devices/0000\:01\:06.0/remove'"
```

I've never tried removal with udev, so you're on your own....

But it might be effective and prevent IPMI code in the kernel from hitting the 
Serial PCI device ?
Only you will know.... after getting udev tuned.

See if you can confirm in syslog or dmesg, whether "udev" is firing "before" 
the IMPI code fires. Hopefully it is, to effect some change.

Regards,
Nick


-----Original Message-----
From: Secan Cristian <[email protected]>
Sent: 09 September 2020 17:32
To: Parrott, Nick
Subject: Re: [Linux-PowerEdge] ACPI Error on Dell PowerEdge T40


[EXTERNAL EMAIL]

Hello Nick,

So let me get it straight, so you mean that the actual problem would be the 
ACPI from Linux side, as in he cannot interpret (handle) the specified region 
provided by the PCI Serial Adapter’s chip (because it’s an old chip)?

If this is the case, am I able to actually disable the PCI Serial Adapter from 
being enumerated (lets say via a grub kernel parameter)?

 I can confirm that the only boot parameter that made the error message gone, 
was using acpi=off (which is the last resort and is not recomended)

P.S. I would definetly disable that PCI Serial Adapter if I knew how to.

BR,
Cristian

> On 9 Sep 2020, at 17:26, Parrott, Nick <[email protected]> wrote:
>
> Hi Cristian,
>
> I don't expect a future BIOS firmware to change this behavior.
>  This is because the new code might not be BIOS, but rather code in a
> Serial Driver-Chip  I can't be sure, but it's just a suspicion, given
> how primitive the chip is ( PCI Serial Adapter )
>
> * Kernel Changes
>
> From what I've read about the newer kernels ( e.g EL8, and also late EL7 ) - 
> there are some similar dmesg/ACPI errors.
>
> Most issues stem from specific modules like "acpi_power_meter"
>  In those cases, disabling "acpi_power_meter" before shutdown, and loading it 
> manually after boot, is considered a possible solution.
>  However, in your case I think it's one of the ACPI module primitives
> that is firing,  ... to query the sleep-capabilities on each PCI
> device
>
> I suspect the only way that will be solved is via a bugfix to the Linux 
> kernel, to change the severity-level, or perhaps an enhancement for excluding 
> PCI devices from ACPI enumeration.
>
> The reality is, you see this on a T40.  But from a linux standpoint, I think 
> it's best that the 'acpi' module of the new Linux kernel has some improved 
> flexibility, e.g. masking which devices to Probe, or Not-Probe.   That means 
> it can be fettled on any device / mainboard etc.
>
> Regards,
> Nick
>
>
> -----Original Message-----
> From: Secan Cristian <[email protected]>
> Sent: 08 September 2020 08:21
> To: Parrott, Nick
> Cc: [email protected]
> Subject: Re: [Linux-PowerEdge] ACPI Error on Dell PowerEdge T40
>
>
> [EXTERNAL EMAIL]
>
> Hello Nick,
>
> Thanks for the quick and very helpful response. I was struggling to 
> understand which part of the motherboard is not recognized by the kernel 
> (although my research considerably improved my knowledge, and the way the 
> Linux kernel works).
>
> The serial port wont be used at all (thats 99% sure), it’s more like a 
> cosmetic issue, because i like my logfiles “clean” (especially the kernel 
> related ones). Now, on the other hand, I don’t have any serial port 
> connection device to test the actual functionality, but I can confirm that 
> disabling the Serial ports in BIOS/UEFI, wont make the error disapear. During 
> my research I saw the same error message encountered by users owning Dell 
> Precision 3630 (most likely because of the identical form factor and 
> motherboard) on linux-harware.com.
>
> I’m hoping that this issue will be fixed by a BIOS/UEFI update. I don’t have 
> iDRAC, I only have the plain Intel AMT.
>
> Any suggestion, tips and/or tricks to mitigate the error message?
>
> BR,
> Cristian
>
>>>> On 7 Sep 2020, at 17:55, Parrott, Nick <[email protected]> wrote:
>> Hi Cristian,
>> Because ACPI is exposed and currently enabled by the BIOS/EFI layer, linux 
>> is trying to register and set up a handle against a Serial Port.
>> While I acknowledge that dmesg contains the word "error", that doesn't mean 
>> there is a systemic issue, or an issue that would affect your use of the 
>> Serial Port.
>> It just means that the Server isn't capable of "putting the Serial-Port to 
>> sleep", which is largely the role of ACPI.
>> Therefore:
>> - Can you confirm if the Serial Port is going to be used?   It's very rare 
>> that it is needed / used.
>> - If so, have you tested it by either:
>> - - Installing "screen" on Linux, to use the Serial-Port for
>> connection to an outside-device ( e.g. a Network Switch with a Legacy
>> Serial Port )
>> - - OR
>> - - Enabling Console-Redirection in the BMC/iDRAC, to send the
>> Server-Console the opposite way ( e.g. Control the Physical Console
>> from an outside Client )
>> Recommendations:
>> - If you do not plan to use the Serial Port, you could disable it in the 
>> BIOS, or perhaps via Lifecycle-Controller UI.
>> - This would be a good security measure, if the Serial Port is not needed in 
>> production.
>> - This may also remove the "errors" you see in dmesg, which I would consider 
>> "non-critical" - at a glance.
>> Final Words:
>> - If the Serial Port is needed in production, but fails to work... this can 
>> surely be investigated further ( ideally via a Support Ticket )
>> - If the Serial Port does work; I would move on to another challenge. There 
>> is little to be gained from trying to adapt this behavior.    ;-P
>> Regards,
>> Nick
>> -----Original Message-----
>> From: Linux-PowerEdge <[email protected]> On Behalf Of
>> Secan Cristian
>> Sent: 07 September 2020 15:04
>> To: [email protected]
>> Subject: [Linux-PowerEdge] ACPI Error on Dell PowerEdge T40 [EXTERNAL
>> EMAIL] Hello, I have a DELL PowerEdge T40 server. Every time I boot
>> Centos 8.x I receive the following ACPI error messages:
>> dmesg | grep -iE "error|failed"
>> [ 0.491170] ACPI Error: No handler for Region [WST1]
>> (0000000069331107) [GenericSerialBus] (20190703/evregion-132) [
>> 0.491170] ACPI Error: Region GenericSerialBus (ID=9) has no handler
>> (20190703/exfldio-265)
>> [ 0.491170] ACPI Error: Aborting method \_SB.PCI0.I2C0.PAS1 due to
>> previous error (AE_NOT_EXIST) (20190703/psparse-531) [ 0.491170] ACPI
>> Error: Aborting method \_GPE._L20 due to previous error
>> (AE_NOT_EXIST) (20190703/psparse-531) [ 0.491170] ACPI Error:
>> AE_NOT_EXIST, while evaluating GPE method [_L20]
>> (20190703/evgpe-515)
>> [ 1.026404] pcieport 0000:00:1d.2: error containment capabilities: Int Msg 
>> #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+ [ 1.091481] 
>> ERST: Error Record Serialization Table (ERST) support is initialized.
>> [ 3.733271] [drm] failed to retrieve link info, disabling eDP I can
>> confirm that this issue is present on RHEL 8.x, Fedora 32, Ubuntu
>> 18.04 LTS, Centos 7.x
>> Can you please assist?
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> [email protected]
>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

Reply via email to