http://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-applicationHOWTO: Build an RT-applicationFrom RTwikiThis document describes the steps to writing hard real time Linux programs while using the real time Preemption Patch. It also describes the pitfalls that destroy the real time responsiveness. It focuses on x86 and ARM, although the concepts are also valid on other architectures, as long as Glibc is used. (Some fundamental parts lack in uClibc, like for example PI-mutex support and the control of malloc/new behavior, so uClibc is not recommended)
LatenciesHardware causes of ISR latencyA good real time behavior of a system depends a lot on low latency interrupt handling. Taking a look at the X86 platform, it shows that this platform is not optimized for RT usage. Several mechanisms cause ISR latencies that can run into the 10's or 100's of microseconds. Knowing them will enable you to make the best design choices on this platform to enable you to work around the negative impact.
Hints for getting rid of SMI interrupts on x861) Use PS/2 mouse and keyboard, 2) Disable USB mouse and keyboard in BIOS, 3) Compile an ACPI-enabled Kernel. 4) Disable TCO timer generation of SMIs (TCO_EN bit in the SMI_EN register). The latency should drop to ~10us permanently, at the expense of not being able to use the i8xx_tco watchdog. One user of RTAI reported: In all cases, do not boot the computer with the USB flash stick plugged in. The latency will raise to 500us if you do so. Connecting and using the USB stick later does no harm, however. ATTENTION!
Do not ever disable the SMI interrupts
globally. Disabling SMI may cause serious harm to your computer. On P4
systems you can burn your CPU to death,
when SMI is disabled. SMIs are also used to fix up chip bugs, so
certain components may not work as expected when SMI is disabled. So,
be very sure you know what you are doing before disabling any
SMI interrupt.
Latencies caused by Page-faultsWhenever the RT process runs into a page-fault the kernel freezes the entire process (with all its threads in it), until the kernel has handled the page fault. There are 2 types of pagefaults, major and minor pagefaults. Minor pagefaults are handled without IO accesses. Major pagefaults are pagefaults that are handled by means of IO activity. Page faults are therefor dangerous for RT applications and need to be prevented. If there is no Swap space used and no other applications stress the memory boundaries, then there is enough free RAM ready for the RT application to be used. In this case the RT-application will likely only run into minor pagefaults, which cause relatively small latencies. But, if the RT application is just one of the many applications on the system, and there is Swap space used, then special actions has to be taken to protect the memory of the RT-application. If memory has to be retrieved from disk or pushed towards the disk to handle a page fault, the RT-application will experience very large latencies, sometimes up to more than a second! Notice that pagefaults of one application cannot interfere the RT-behavior of another application. During startup a RT-application will always experience a lot of pagefaults. These cannot be prevented. In fact, this startup period must be used to claim and lock enough memory for the RT-process in RAM. This must be done in such a way that when the application needs to expose its RT capabilities, pagefaults do not occur anymore. This can be done by taking care of the following during the initial startup phase:
File handlingFile handling is known to generate disastrous pagefaults. So, if
there is a need for file access from the context of the RT-application,
then this can be done best by splitting the application in an RT part
and a file-handling part. Both parts are allowed to communicate through
sockets. I have never seen a page fault caused by socket traffic. Note:
While accessing files the low-level fopen() call will do a mmap() to
allocate new memory to the process, resulting in a new pagefault.
Global variables and arraysGlobal variables and arrays are not part of the binary, but are
allocated by the OS at process startup. The virtual memory pages
associated to this data is not immediately mapped to physical pages of
RAM, meaning that page faults occur on access. It turns out that the
mlockall() call forces all global variables and arrays into RAM,
meaning that subsequent access to this memory does not result in page
faults. As such, using global variables and arrays does not introduce
any additional problems for real time applications. You can verify this
behavior using the following program (run as 'root' to allow the
mlockall() operation)
Priority Inheritance Mutex supportA real-time system cannot be real-time if there is no
solution for priority inversion, this will cause
undesired latencies and even deadlocks. (see [2])
Errata for ARM:
On ARM the slow-path for PI-futexes is first integrated in the RT-patch
2.6.23.rc4-rt1. The patch is however easily back-portable to older
kernels (>= 2.6.18) without breaking things. (Just check the file
'include/asm/futex.h' in the kernel code.) The futex slowpath on ARM
requires the memory locking scheme as described above. The futex
administration is never allowed to be paged out to disk, because the
futex-administration memory is accessed with interrupts disabled. This
was necessary because the ARM9 v4 and v5 cores do not have the required
test-and-set atomic instructions to do it nicely. This errata is not
relevant to X86, because X86 supports the required atomic assembler
instructions to do it properly without interrupt locking.
The impact of the Big Kernel LockThe Big Kernel Lock (BKL) is preemptible on Preempt-RT. BKL is backed by a mutex (rtmutex) in -rt instead of a regular spinlock. BKL is a special case lock that is released at schedule() then reacquired when the thread is woken up. This is a coarse grained lock that is use to protect the kernel in places that are not thread safe. It has special rules regarding its use and was designed to handle the cases where an IO call is blocked on a wait queue versus blocking as a result of contention from a sleepable semaphore. Significant parts of the kernel still use BKL, Posix flock code namely, as well as other places. If a RT-thread uses a system call that locks the BKL; it can experience unbounded latencies when the BKL is locked by another thread. Any calls into the kernel, from a real time capable thread (SCHED_FIFO), must keep this into account otherwise priority inversions can take place. Just about every system call in the Linux kernel acquires a lock of some sort and can result in difficult to predict latencies and is especially the case because of the wide use BKL in non-thread safe places in the kernel. One problematic place is the ioctl() handler in device driver layer. It normally acquires BKL on syscall entry and is released when coming back into userspace. However, there is a non-BKL acquiring variant of this handler that can be used instead, provided that the handler function is MP/thread safe: static struct file_operations my_fops = {
.ioctl = my_ioctl, /* This line makes my ioctl() a BKL locked variant. */
.unlocked_ioctl = my_ioctl, /* This version does not use the BKL (Notice that this version requires a slightly different ioctl() argument list) */
};
Building Device Drivers(This Chapter is under construction) Interrupt HandlingThe RT-kernel handles all the Interrupt handlers in thread context. However, the real hardware interrupt context is still available. This context can be recognised on the IRQF_NODELAY flag that is assigned to a certain interrupt handler during request_irq() or setup_irq(). Within this context a much more limited kernel API is allowed to be used. Things you should not do in IRQF_NODELAY context
Author/MaintainerRemy Bohmer |
- [linuxkernelnewbies] HOWTO: Build an RT-application - RTwiki Peter Teoh
