How Linux suspend and resume works in the ACPI age
Posted 7 Feb 2007 at 11:16 UTC by mjg59 
Back in the APM days, everything was easy. You called an ioctl on
/dev/apm, and the kernel made a BIOS call. After that, it was all up to
the hardware. Sure, it never really worked properly, and it was
basically impossible to debug what the hardware actually did. And then
ACPI came along, and nothing worked at all. Several years later, we're
almost back to where we were with APM. But what's actually happening
when you hit that sleep key?
Without the ability to suspend and resume, laptop users are doomed
to
spend several hours of their lives waiting for machines to boot and
shutdown. This is, clearly, suboptimal. APM made it fairly easy to
implement this, because almost everything was handled by the BIOS. And
that, in a nutshell, is one of the primary reasons why ACPI ended up in
charge.
The biggest problem with APM is that it left policy in hardware.
Don't
want to suspend on lid closure? The OS doesn't get any say in the
matter, though if you're lucky there might be a BIOS option to control
it. Would prefer it if the BIOS didn't scribble all over the contents
of
your video registers while it tries to reprogram them (probably back to
the defaults of the Windows drivers...)? Sucks to be you. Want the
sleep
button to trigger suspend to disk, not suspend to RAM? A-ha ha ha.
ACPI deals with that problem, by moving almost all the useful
functionality out of hardware. The downside of this is that the
functionality needs to be reimplemented in the OS. Which, given that
the
ACPI spec is around 600 pages long, has taken a little time.
(Of course, it turns out that most of the ACPI spec is entirely
uninteresting for suspend and resume purposes, but that's not really
the
point right now)
So, firstly, lets have some ACPI jargon. ACPI itself stands for
"Advanced Configuration and Power Interface". It's not just a power
management spec - it provides the OS with a description of all the
built-in hardware in your system, along with a certain degree of
abstraction. It gives you information about interrupt routing, tells
you
if someone's just removed a hot-pluggable DVD drive from a laptop and
may even let you control which video output is being used.
This information is provided in a table called the DSDT (Discrete
System
Descriptor Table). The DSDT is in a bytecode called AML (ACPI Machine
Language), compiled from a simple language called ASL (ACPI Source
Language, shockingly enough). At boot time, the system reads the DSDT,
parses it and executes various methods. These can do pretty much
anything, but on the bright side they're being executed in kernel
context and (in principle) you can filter out anything that you really
don't want to do (such as scribbling all over CMOS or something).
The final relevant piece of ACPI information is something called
the
FADT, or Fixed Address Descriptor Table. This gives the OS information
about various register addresses. It's a static structure, and doesn't
contain any executable code.
So, how does all of this stuff actually work?
First of all, the user hits the sleep key. This triggers a hardware
interrupt, which is caught by the embedded controller. That pokes a
register in the southbridge, which flags that a general purpose event
has just occured. The OS notices this, and checks the DSDT for what's
supposed to happen next. Generally, this just calls a notification
event. This is bounced back out to userspace via /proc/acpi/events
(currently, though it's going to be moved to the input layer in future)
and userspace gets to choose what happens next.
Let's concentrate on the common scenario, which is that someone
hitting
the sleep button wants to suspend to RAM. Via some abstraction (either
acpid, gnome-power-manager or kpowersave or something), userspace makes
that decision and initiates the suspend to RAM process by either
calling
a suspend script directly or bouncing via HAL.
Depending on distribution, this ends up running a shell script or
binary
which attempts to prepare the system for suspend. Right now, this tends
to involve a bunch of bandaids around various broken drivers -
unloading
modules and reloading them is one of the easiest workarounds for
breakage. Finally, the string "mem" is written to /sys/power/state.
This jumps back into the kernel. First, userspace is stopped. This
stops
it getting horribly confused when a load of hardware mysteriously stops
working. Then the kernel goes through the device tree and calls suspend
methods on each bound driver. Individual drivers have responsibility
for
storing enough state in order to be able to reprogram the device on
resume - ACPI doesn't make guarantees about what the hardware state is
going to be when we come back. Once the kernel-side suspend code has
been run, we execute a couple of ACPI methods - PTS (Prepare To Sleep)
and GTS (Going To Sleep). These tend to poke various things that the
kernel knows nothing about, and so a certain amount of magic may be
involved.
At this point, the system should be fairly quiescent. Only two
things to
do now. Firstly, the address of the kernel wakeup code is written to an
address contained in the FADT. Secondly, two magic values from the DSDT
are written to registers described in the FADT. This usually causes
some
sort of system management trap, which makes sure that the memory is put
in self-refresh mode and actually sequences the machine into suspend.
For the S3 power state, this basically involves shutting the machine
(other than the RAM) down completely.
Time passes.
The user presses the power button. The system switches on, jumps to
the
BIOS start address, does a certain amount of setup (programming the
memory controller and so on) and then looks at the ACPI status
register.
This tells it that the machine was previously suspended to RAM, so it
then jumps to the wakeup address programmed earlier. This leads it to a
bunch of real-mode x86 code provided by the kernel, which programs the
CPU back into protected mode and restores register state. Suddenly
we're
running kernel code again.
From this point onwards, it's much the reverse of the suspend
process.
We call the ACPI WAK method, resume all the drivers and restart
userspace. The shell script suddenly starts running again and cleans up
after itself, reloading any drivers that were unloaded before suspend.
As far as userspace is concerned, the only thing that's happened is
that
the clock has jumped forward.
So why is this difficult?
In a lot of cases, it's just down to bugs in the drivers. Restoring
hardware state can be hard, especially if you don't actually have all
the documentation for the hardware to start with - traditionally, many
Linux drivers have ended up depending on the BIOS to have programmed
the
hardware into a semi-sane state, and there's no guarantee that that
will
happen with ACPI. Other cases can just be oversights - for instance,
the
bug in the APIC (not to be confused with ACPI) code that meant a single
register wasn't restored, resulting in some machines resuming without
any interrupts being delivered.
The single biggest problem is video hardware. The spec doesn't
require
the BIOS to reprogram the video hardware at all, and so often it'll
come
back in an entirely unprogrammed state. This is an issue, since we (in
general) have absolutely no idea how to bring a video card up from
scratch. One of the easiest workarounds is to execute code from the
video BIOS in the same way that the system BIOS does on machine
startup.
vbetool lets you do this from userspace, and it works a surprisingly
large amount of the time. However, there's no guarantee that it'll be
successful. Vendors often unmap that section of BIOS after the system
has been brought up, since they've got far more BIOS code than will fit
in the BIOS region of the legacy address space. In the long run, the
only solution is drivers that know how to program an entirely
uninitialised chip. The new modesetting branch of the Intel driver aims
to do this, as do the developers of noveau.
Despite all this misery, ACPI support is generally improving. Most
machines can now suspend and resume once more. The next big challenge
is
improving run-time power management in order to get battery life to at
least the level it is under Windows, and ideally beyond that.
Thanks, posted 9 Feb 2007 at
20:28 UTC by ncm » (Master)
Thank you, Matthew. I gave up on ACPI-suspend on my Dell D620, and
switched to swsuspend2. It mostly works, modulo the occasional OOPS
while trying to suspend which I am inclined to blame on the bcm43xx
driver. Since the whole machine powers off, the BIOS is obliged to
initialize everything when it comes back up.
CPU speed is another area where ACPI seems to over-complicate
matters. I see hardly any reason to operate the CPUs in any mode
between completely powered-off and as fast as they can go. In
particular, "throttling modes" look completely pointless. Maybe they're
supposed to
reduce power usage when people are running screen savers or other
spin-loop processes that shouldn't be running at all?
Probably the only solution, ultimately, to the ACPI/BIOS problem
will
be to "turn" a big laptop reseller, and count on them put the screws on
the manufacturers. In the meantime it's a holding action.
Now, if only I could find time to trace the code that maps key
events
to scripts on Debian, or find a page by somebody who did...
we have to deal with this, a lot - nothing to do with ACPI, on the
reverse-engineering project to get linux running on HTC (high tech
corporation) handhelds (smartphones and pdas). it's the one major
irritating factor that can stop a device from being useful.
i managed to get linux running entirely on the ipaq hw6915 in
just
six weeks (because of the common hardware between the htc universal,
ipaq hx4700 and a couple of others).
however, i spent a very frustrating further two weeks trying
to work out what it was that was stopping the device from being able to
resume: exactly as you say, interrupts weren't being enabled [in the
end i had to give up as i was running out of time]
now the really annoying thing is that it is terribly difficult to
track down why this is.
firstly, you're resuming: you _can't_ do any significant
debugging: it either works, or it doesn't. if you get it right, it
resumes. if you don't get it right, you've no way of communicating
anything to find out why.
secondly, the booting is coming not from startup but from
gnu-haret.exe which is the ARM / wince equivalent of LOADLIN.EXE for
x86 / win32. so, you're booting into linux with most of the hardware
preinitialised. on some devices, it is an absolute _bitch_ to work out
the GPIO pin requirements, some of which require switching to alternate
states and back with special timings in between to let buggy or glitchy
hardware recover, because there's not enough current or something - you
just don't know.
and on one device, 272 GPIO pins (192 on the CPU, 16 on a
chip we've named EGPIO for 'extended gpio', and 64 on a separate custom
I/O chip which we've named ASIC3) ... actually weren't enough, so the
designers had to _borrow_ some of the I/O pins on the GSM radio rom
(another 64 or so gpio pins - it's another ARM processor) and so you
have to, unbelievably, communicate proprietary commands over a _serial_
line to specify which set of speakers are to be used! (yes, this device
i'm describing i think it has 5 speakers 2 of which are stereo speakers
and 3 microphones and a headphone socket _and_ bluetooth audio _and_ a
car-kit)
physics, posted 13 Feb 2007
at 05:10 UTC by ncm » (Master)
Reverse engineering is like physics. There's certainly a design because
it (universe, gadget) exists. There's equally no guarantee the design
can be elucidated from obtainable evidence. The only saving grace for
reverse engineers is that increased complexity drives designers to more
standardized components, because the designers have to be able to
understand it themselves. We just have to hold out until Linux is one
of those components.
Physicists, OTOH, are probably SOL.
yes. the increased commoditisation and success of HTC means that they
are endeavouring to drive down the development costs.
the HTC smartphone/PDAs (HTC designs all the iPAQs) from the past
three or so years have all used the Akai 4641 sound chip instead of the
rather esoteric philips UDA1380 (which is a pain to program in some of
the hardware configurations it's been used in : the ADC and DAC clocks
can be on separate ... it's a long story :) )
the first HTC device (the wallaby) was an _absolute_ bitch,
and the only saving grace was that the next two revisions kept the
majority of the components and also that one of the chips was also in
the iPAQs, which the HP/Compaq engineers had at least some scant
internal documentation on.
but anyway - this is off-topic for the original article so i'll
shut up now :)