http://www.ibm.com/developerworks/library/l-devdebug.htmlDebugging simulated hardware on Linux, Part 1: Device driver debuggingTest your driver's entire code flow Summary: This two-part series is geared toward easing device driver development. This first part illustrates proven methods you can use to test the complete code flow of a device driver during the design, development, and debugging stages. Date: 02 Nov 2005
Imagine that you're asked to develop a device driver from scratch, but you don't have the target hardware available to you while you are developing the software. (Perhaps the hardware and software development are occurring simultaneously.) How would you go about designing, developing, testing, and debugging your driver? Furthermore, suppose your device driver is very complex, having
multiple threads, accessing hardware registers, and utilizing such
advanced programming as DMA in user land by making use of Or, what if your driver's Interrupt Service Routine (ISR) is too complex, having multiple interrupts in a setup where every interrupt has to be treated differently? Alternatively, what if you are asked to write a device driver for an embedded system that does not have a very sophisticated debugging environment? You may also be asked to write software to test the hardware itself. This article explains the methods I follow during the design, development, and debugging stages of a driver. These methods are helpful when you test the complete code flow of a driver under one or all of the developmental stages. I consider this method a development strategy rather than a debugging technique. For a more detailed discussion of the problem and implementation details, please read Part 2 of this two-part series, "Debugging simulated hardware on Linux, Part 2: Interrupts and Interrupt Service Routine." The examination defined in this section is to test the flow of the various low-level helper functions and entry points in the device driver. This source code testing tests all the logic and control statements used in the driver. In this method, all the entry points and helper functions are either replaced by the dummy functions or mapped to some dummy macros. To do this, you can do either of these alternative techniques:
In both of these techniques, dummy functions may return success and
would log the appropriate messages when they are being called ( These dummy or self-testing functions may do very minimal tasks in order to keep the entire system in a safe state, such as setting some device- or driver-related global flags and manipulating global structures. The rest of the driver operations may need these flag settings and structure manipulations. You can do this testing without needing the actual hardware, so these functions comprise a pseudo driver.
Say, for instance, in the
Instead of calling
To put it simply, you can map the
The basic guideline is that any call that directly accesses the device
(such as Similarly, you can map and test the code flow of other driver entry points and their helper functions. You might have a big initialization routine of more than 100 lines of code, and it might have four or five direct hardware access calls, possibly within a loop. The remaining portion of the code could be driver- and device-specific details and the logic associated with them. In this case, you do not need to replace the entire function as a dummy/stub function. Instead, you map only the hardware access calls so that the entire code flow is covered during debugging and testing. You could adopt the same method for non-PCI-based device drivers as well. I used the same technique while developing a USB gadget driver without any trouble. As you know, interrupts are asynchronous and need to be serviced carefully. The ISR runs in a special context, and you should take care not to make the regular kernel process race with the ISR. So, what is the process to test the code flow in the ISR? The following sections give you a simple roadmap. By using the polling method, you can trace the complete code path in the ISR. You can schedule a tasklet to act as the actual ISR. The Tasklet context is close to that of the Interrupt context; this is why tasklets also have some restrictions, as in the case of ISR, in using certain blocking calls. You can disable the interrupt and write a simple kernel thread that could execute the ISR in some regular interval or in some particular sequence or settings programmatically. To test your ISR running in ISR context only, you have soft IRQs and
software-generated interrupts. For instance, on an x86 platform, you
could use the Employ a special ioctl function
A sophisticated way to have more control over raising interrupts and
testing the ISR is to follow a two-tier architecture, having a special I prefer to load the software in the debugger and run each individual path/line of code at least once and test all the branching statements. If you want to try that, the ideal way is to disable the interrupts and perform polling while you are still in the debugger. In the debugger, you could then set different values for the variables at runtime and check the effect of it. However, you must be very sure about the time-critical nature of any particular function that you are debugging. I once had a nightmarish experience while debugging a 400MB VSS source that was developed a decade ago with multilayered and multiple modules. Every time I entered the debugger, the TTL (Time to Live) factor would disconnect the network communication I had established before I started the debugging session. If you are developing a connection-oriented networking driver, you should take this into account. In the ultimate environment, you can run your driver in the debugger and debug the driver while interrupts are enabled and processed. You need some level of expertise with the debugger and interrupts to achieve this. If you've come this far, you are now ready to test your driver on the actual hardware. After testing all the code paths, carefully remove all the wrappers and dummy/stub functions and give access to the driver to access the hardware. A natural question to ask at this point is: Why would anyone want to do all the simulation, polling, etc., when the hardware was available? Here are some reasons:
Testing and debugging the device driver on the simulated environment is obviously easier than testing the same on the actual hardware. In the simulated environment, you have complete control over raising interrupts, and you can step through the source code. In the actual target device environment, interrupt generation could be more asynchronous. You also may not be able to step through some portion of the code in the target hardware environment if the environment is something similar to the connection-oriented network traffic mentioned in the previous section. Another reason to go through the trouble of simulation when the hardware is available is that sometimes the kernel and/or the hardware will malfunction. It would be difficult to debug the untested code on a new kernel or hardware. As a developer, you get used to testing your code before it is tested on the hardware. If there were more than one odd issue, it would be hard to find. In testing, first give access to all the hardware access calls in
the initialization routine and see if everything works properly ( One very basic test to conduct on any hardware is the access test: You should be able to read and write to the hardware. This basic test must be conducted before you go further with any other control logic in the driver. Most devices have some well-known registers or memory locations that are pre-initialized (that is, filled with well-known values) during power-up. You should be able to read those registers or memories to confirm that basic hardware read access works properly. Once these lowest-level functions are tested, you can replace all dummy functions with the actual entry points and helper functions, one by one and level by level. I use these approaches every time I do a native migration or develop a fresh driver. Incremental testing and development is always better than testing all the untested code in a single shot (brute-force testing). It is always better to test the software as you develop, and implement new features one by one. The following test cases provide step-by-step instructions on what to check for. Check whether all the locking primitives (spinlock, semaphore, mutex, read write locks) are getting released in all possible code flows. This would help distinguish whether you are in an infinite loop or in a deadlock situation. If you ever end up in system lock and suspect any of the locking primitives, log some message in all the places where these locking primitives are used. I also suggest having a logging or debug trace facility in the driver that can be enabled or disabled and can distinguish different levels of debugging. If you were to place a debug trace in an infinite loop, you would definitely get a console full of messages. If the driver is a module, check to see if loading and unloading the driver succeeds. Check whether automatic loading and automatic unloading works properly without any warnings or messages from the kernel. Check whether proper shutdown happens after the driver is loaded and unloaded a few times. Any memory leaks or invalid memory accesses that were not thrown during driver access would most likely be thrown during shutdown. Check whether open, close; open, read, close; and open, write, close succeed. Check all possible combinations at least once.
Also check to see that all Check for memory leaks with the appropriate tools. Since the kernel modules have access to the entire memory space, it may be difficult to distinguish between legal and illegal memory access. Very small memory leaks in the driver may not be easy to catch. You might want to automate the testing if you're going to run your tests a number of times. The best approach is to go through the entire source and come up
with test cases that cover all possible code paths (for example, a Learn
Get products and technologies
Discuss
Arun Prasad Velu holds a Master of Computer Applications degree from Madras University. He has more than six years of experience developing device drivers for various operating systems, including Linux, FreeBSD, OS/2, Windows NT/2000, and VxWorks. Linux and open source systems are his primary areas of interest. Currently, he is a Technical Manager with Aspire Communications, a company that focuses on embedded hardware design, product re-engineering, device driver development, OS porting, and application software development. |
