http://lkml.org/lkml/2007/4/3/110Hi, With at least 3 of the following 4 patches, s2ram and s2disk are fixed on at least the Acer Ferrari 1000 notebooks and at least s2disk on the Acer Ferrari 5000 notebooks. The Acer Ferrari 1000 is a 12" Turion 64 X2 notebook with only 1.7 kg weight while the Ferrari 5000 is a 14" AMD Turion notebook and a bit older but still not quite old. Introduction: ------------- The memory interface of AMD K8 CPUs supports "Extended fixed-range MTRR Type-Field Encodings" which allow to specify whether accesses to certain address ranges are executed by accessing RAM thru the AMD Direct Connect Architecture or by executing memory-mapped I/O. The associated CPU feature is called IORR's or I/O Range Registers. This allows e.g. to implement Shadow RAM by copying ROM contents into RAM. The BIOS of these Acer AMD Turion 64 notebooks makes use of fixed-range IORRs to implement shadow RAM and other configurations, and it does this while it transitions the system into ACPI mode, but Linux is not prepared for that and breaks the IORRs/MTRRs while suspending, resulting in errors which look like hardware errors. Symptoms vary from from instant resets and power-offs to SATA link failures. References: http://en.wikipedia.org/wiki/MTRR http://en.wikipedia.org/wiki/Direct_Connect_Architecture http://en.wikipedia.org/wiki/Memory-mapped_IO http://en.wikipedia.org/wiki/Random_access_memory#Shadow_RAM In-depth problem description: ----------------------------- AMD introduced this MTRR extension with the AMD64 CPUs which have integrated memory controllers. In part (fixed-range IORRs for addresses below 1MB), AMD uses the fixed-range MTRR registers which already configure the address range below 1MB to implement corresponding IORR bits and calls the resulting memory access methods in combination with the original Intel-style MTRR bits "Extended fixed-range MTRR Type-Field Encodings". They are documented in section 7.8.1 of the "AMD64 Architecture Programmer's Manual Volume 2: System Programming", starting with page 234: http://amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf The extended fixed-range type-field encodings are enabled using two bits in the AMD64-specific SYSCFG MSR (described in detail on page 59). When the extended fixed-range type-field encodings are enabled, all fixed-range MTRR fields are defined this way: 7 5 4 3 2 0 +--------------------------------------------------------------------+ | reserved | IORR RdMem bit | IORR WrMem bit | Intel-style MTRR bits | +--------------------------------------------------------------------+ Linux MTRR code does not yet support that the BIOS may change fixed-range MTRRs and when suspending, this leads Linux MTRR code to clear the IORR bits for some memory regions. The resulting combination of IORR and Intel-style MTRR bits is not allowed and causes undefined and unpredictable behaviour (see the last paragraph before table 7-10). A possible workaround is to detect the Acer Ferraris through DMI and don't change the fixed-range MTTRs on them. Linux MTRR code has no external interface to change fixed-range MTRRs, so no functionality and no syncronisation (it's broken on the Ferrari 1000 after ACPI is enabled if no real fix is applied) is lost by this patch: --- linux/arch/i386/kernel/cpu/mtrr/generic.c +++ linux/arch/i386/kernel/cpu/mtrr/generic.c @@ -3,6 +3,7 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/mm.h> +#include <linux/dmi.h> #include <asm/io.h> #include <asm/mtrr.h> #include <asm/msr.h> @@ -149,6 +150,13 @@ static int set_fixed_ranges(mtrr_type * int changed = FALSE; int i; unsigned int lo, hi; + char *vendor = dmi_get_system_info(DMI_SYS_VENDOR); + char *product = dmi_get_system_info(DMI_PRODUCT_NAME); + + if (vendor && product && !strncmp(vendor, "Acer", 4) && + (!strncmp(product, "Ferrari 1000", 12) || + !strncmp(product, "Ferrari 5000", 12))) + return FALSE; rdmsr(MTRRfix64K_00000_MSR, lo, hi); if (p[0] != lo || p[1] != hi) { As DMI is defined "y" on i386 and x86_64 and x86_64 is the only $ARCH which includes $TOPDIR/arch/i386/kernel/cpu/mtrr, DMI checks can be done without #ifdef CONFIG_DMI. It's a possible hack for older kernels. But do we want that workaround also for upstream? The changes which fix the issues themselfes (and properly work around the lazyness or "limits" of the BIOS) are a bit more complicated, and need at least 3 of the following 4 patches. Linux MTRR code is well-trained in correcting broken BIOS MTRR settings, so from that history, I think it is good routine to continue that tradition. In this case, most of what the BIOS does is not broken by design, but just not supported by Linux yet. The only bug is that the BIOS does not initialize the 2nd core with the same MTRRs that the Boot CPU has after it has enabled ACPI mode. Linux MTRR code has already mechanisms in place to fix such BIOS issues, they just need to be extended to also work in circumstances like those found on the Ferraris. State of Linux fixed-range MTRR code and changes done by the patches -------------------------------------------------------------------- Currently, we only get the state of fixed-range MTRRs eary at boot, before ACPI is enabled. This information is outdated later after ACPI mode has been activated in the Ferraris. At least when we suspend the system, we need to have our backup of the fixed-range MTRRs in sync with the fixed-range MTRRs which are currently in force. The first patch provides a function (mtrr_save_fixed_ranges) to save the current state of the fixed-range MTRRs. The second patch calls mtrr_save_fixed_ranges() before the non-boot CPUs are booted. At these times, ACPI is already initialized and the Ferrari BIOSes changed the MTRRs already, so the new CPUs are then always initialized with the current fixed-range MTRR of the boot CPU. Alternatively, or in addition, it might be a good idea to save the current state of the fixed-range MTRRs of the boot CPU before suspending to capture any change of the fixed-range MTRRs which may potentially have occurred after the last secondary CPU has been booted. That is what the third patch does. The fourth patch is AMD64-specific and adds support for keeping the fixed-range MTRRS in sync across CPUs if some fixed-range IORR bits are set on the boot CPU. If no fixed-range IORR bit is set, there is no need to enable the support for "Extended fixed-range type-field encodings" because the access mode only changes if these bits are set, otherwise, we do not need to enable them. Best Regards, Bernhard Kaindl PS: I attached a program which uses msr.ko (CONFIG_X86_MSR) from Processor type and features -> [M/*] /dev/cpu/*/msr - Model-specific register support to dump the contents of the fixed-range MTTRs to stdout. On a two-CPU system, the output format looks like this: MSR 0x250: 1e1e1e1e1e1e1e1e | 0606060606060606 | MSR 0x258: 1e1e1e1e1e1e1e1e | 0606060606060606 | MSR 0x259: 0000000000000000 | 0000000000000000 | MSR 0x268: 1515151515151515 | 0505050505050505 | MSR 0x269: 1515151515151515 | 0505050505050505 | MSR 0x26a: 0000000000000000 | 0000000000000000 | MSR 0x26b: 0000000000000000 | 0000000000000000 | MSR 0x26c: 1515151500000000 | 0505050500000000 | MSR 0x26d: 1515151515151515 | 0505050505050505 | MSR 0x26e: 1515151515151515 | 0505050505050505 | MSR 0x26f: 1515151515151515 | 0505050505050505 | This dump comes from the Ferrari 1000, where the 2nd core uses the same Intel-style MTRR values as the 1st core, but the 1st core has the AMD64 IORR bits set in many places. On suspend, 2.6.20 sets all of them to 0, making the dump look like the dump of the second core, and shortly afterwards, the problems start. To have a sane MTRR setup the MTRRs of the CPUs should always match, but this has so far not lead to problems, it seems. However, Intel and AMD manuals tell us to keep MTRRs in sync for good reasons./* * Test program for reading current MTRR values * * (C) 2007 Bernhard Kaindl * (C) 2000 Dave Jones, Arjan van de Ven. */ #include <stdio.h> #include <unistd.h> /* rdmsr - read MSR, taken from Powertweak Linux * (C) 2000 Dave Jones, Arjan van de Ven. * Licensed under the terms of the GNU GPL License version 2. */ int rdmsr(int cpu, unsigned long msrindex, unsigned long long *val) { char cpuname[36]; int fh, ret; snprintf (cpuname,15, "/dev/cpu/%d/msr", cpu); fh = open (cpuname, 0); if (fh==-1) ret = -1; else { lseek (fh, msrindex, SEEK_SET); ret = (read (fh, val, 8) == 8); close (fh); } return ret; } int print_msr_cpu(int cpu, int reg) { unsigned long long val; int ret; if ((ret = rdmsr(cpu, reg, &val)) > 0) printf("%016llx", val); return ret; } print_msr(int reg) { int i = 1, cpu = 0; printf("MSR 0x%03x: ", reg); if (print_msr_cpu(cpu++, reg) <= 0) printf("read error - Is msr.ko loaded and are the devices created?\n"); while (cpu < 16) { printf(" | "); if (print_msr_cpu(cpu++, reg) <= 0) break; } printf("\n"); } main() { int i; print_msr(0x250); print_msr(0x258); print_msr(0x259); for (i=0x268; i<0x270; i++) print_msr(i); } |
