Linux Loadable Module

A Linux Loadable Module is a feature of the Linux Kernel that allows users to add features to the system at runtime without requireing a reboot. The term is a.k.a LKM, Linux Module, Loadable Module, Kernel Module, or simply Module.

Overview
A Linux Loadable Module is an ELF (Executable and Linkable Format) object file. This ELF file is given to the "insmod" system call. Each module designates two functions to be called at module loading (init) and removing (exit) respectively. In the init function the module performs any initialization it needs such as allocating memory, calling functions to register a device driver or a file system, or hook some kernel functions like system calls. The exit function perform the opposite of the init function and frees all allocated objects.

Event Driven VS Sequential
While most small and medium-sized applications perform a single task from beginning to end, every kernel module just registers itself in order to serve future requests, and its initialization function terminates immediately. It then stays resident in the kernel memory waiting for its functions to be called (event)

Aggressive VS Lazy Resources Cleaning
Whereas an application that terminates can be lazy in releasing resources or avoids clean up altogether, the exit function of a module must carefully undo everything the init function built up, or the pieces remain around until the system is rebooted.

Kernel Symbol Table VS LibC
User applications can use LibC while modules can only use symbols exported by the kernel or by other loaded modules.

Oops VS Signals
Whereas a segmentation fault (singal) is harmless during application development and a debugger can always be used to trace the error to the problem in the source code, a kernel fault kills the current process at least, if not the whole system, in what is called an Oops.

User Modules VS Kernel Modules
Advantages of user-space drivers  The full C library can be linked in The programmer can run a conventional debuggers If a user-space driver hangs, you can simply kill it User memory is swappable, unlike kernel memory A well-designed driver program can still, like kernel-space drivers, allow concurrent access to a device. You are allowed to make closed-source You dont have to worry about kernel versions 

Disadvantages of user-space drivers  Interrupts are not available in user space</li> Direct access to memory is possible only by mmapping /dev/mem, and only a privileged user can do that</li> Access to I/O ports is available only after calling ioperm or iopl. Not all platfomrs support this, and access to /dev/ports can be too slow</li> Response time is slower, because a context switch is required</li> f the driver has been swapped to disk, response time is unacceptably long. Using the mlock system call might help, but usually you’ll need to lock many memory pages, because a user-space program depends on a lot of library code. mlock, too, is limited to privileged users</li> The most important devices can’t be handled in user space, including, but not limited to, network interfaces and block devices </li> </ul>

Getting Started
Step1: Setting Up The Environment

First you need to install a Linux source tree. You can either obtain one from an online mirror or use the one from the application repository of your Linux distribution.

If you are using YUM (Yellow Update Manager) try the following command, as root, to install all required module development files:

For the following steps, assume you have one under /usr/src/kernels/2.6.X/.

Step2: Writing The Code

A skleleton of a linux module

Step3: Compiling and Building

To compile kernel modules in Linux 2.6 and later, you will use the kernel build process. The build process is a collection of scripts, objects, and make files that use your custom made make file to build your modules. Assume your module source file is called hello.c and include util1.c and util2.c

Make file

To use this file, cd to its directory (same directory as hello.c) and execute the following command

Note that the name of the make file must be exactly "makefile". The result of compilation, among many other files, is the module file hello.ko

The files found in the Documentation/kbuild directory in the kernel source are required reading for anybody wanting to understand all that is really going on beneath the surface.

Step4: Loading and Unloading

Load with this command (as root): Unload with this command (as root): Notice the missing path and file extension in the remove.

Printing and Viewing Message from Kernel Space
The printk function behaves similarly to the standard C library function printf. The place where the messages show up depends on the priority of the message, the kernel version you are running, the version of the klogd daemon, your configuration of syslogd, and the type of terminal you are logged at. The printk function writes messages into a circular buffer that is __LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is waiting for messages, that is, any process that is sleeping in the syslog system call or that is reading /proc/kmsg. These two interfaces to the logging engine are almost equivalent, but note that reading from /proc/kmsg consumes the data from the log buffer, whereas the syslog system call can optionally return log data while leaving it for other processes as well. In general, reading the /proc file is easier and is the default behavior for klogd. The dmesg command can be used to look at the content of the buffer without flushing it; actually, the command returns to stdout the whole content of the buffer, whether or not it has already been read. If the circular buffer fills up, printk wraps around and starts adding new data to the beginning of the buffer, overwriting the oldest data. Therefore, the logging process loses the oldest data.

Priority

printk lets you classify messages according to their severity by associating different loglevels. For example

The loglevel macro expands to a string, which is concatenated to the message text at compile time. There are eight possible loglevel strings, defined in the header  here they are listed in order of decreasing severity  KERN_EMERG: Used for emergency messages, usually those that precede a crash</li> KERN_ALERT: A situation requiring immediate action</li> KERN_CRIT: Critical conditions, often related to serious hardware or software failures</li> KERN_ERR: Used to report error conditions; device drivers often use KERN_ERR to report hardware difficulties</li> KERN_WARNING: Warnings about problematic situations that do not, in themselves, create serious problems with the system</li> KERN_NOTICE: Situations that are normal, but still worthy of note. A number of security-related conditions are reported at this level</li> KERN_INFO: Informational messages. Many drivers print information about the hardware they find at startup time at this level</li> <li>KERN_DEBUG: Used for debugging messages</li> </ul> A printk statement with no specified priority defaults to DEFAULT_MESSAGE_LOGLEVEL, specified in kernel/printk.c as an integer. In the 2.6.10 kernel, DEFAULT_MESSAGE_LOGLEVEL is KERN_WARNING, but that has been known to change in the past.

Terminal Type and Configuration

If the priority is less than the integer variable console_loglevel, the message is delivered to the console one line at a time (nothing is sent unless a trailing newline is provided). The variable console_loglevel is initialized to DEFAULT_CONSOLE_LOGLEVEL and can be modified through the sys_syslog system call. One way to change it is by specifying the –c switch when invoking klogd, as specified in the klogd manpage. It is also possible to read and modify the console loglevel using the text file /proc/sys/kernel/printk. The file hosts four integer values: the current loglevel, the default level for messages that lack an explicit loglevel, the minimum allowed loglevel, and the boot-time default loglevel. Writing a single value to this file changes the current loglevel to that value; thus, for example, you can cause all kernel messages to appear at the console by simply enteringthe following code

Note that the kernel can only log messages to the console which can only be pointing at a virtual terminal. So, if you are logged in at a psuedo terminal (e.g. xterm) on an X server, you will not be able to see any messages.

 klogd and syslogd 

If both klogd and syslogd are running on the system, kernel messages are appended to /var/log/messages (or otherwise treated depending on your syslogd configuration), independent of console_loglevel. If klogd is not running, the message won’t reach user space unless you read /proc/kmsg (which is often most easily done with the dmesg command). When using klogd, you should remember that it doesn’t save consecutive identical lines; it only saves the first such line and, at a later time, the number of repetitions it received. If the klogd process is running, it retrieves kernel messages and dispatches them to syslogd, which in turn checks /etc/syslog.conf to find out how to deal with them. syslogd differentiates between messages according to a facility and a priority; allowable values for both the facility and the priority are defined in <sys/syslog.h>. Kernel mes- sages are logged by the LOG_KERN facility at a priority corresponding to the one used in printk (for example, LOG_ERR is used for KERN_ERR messages). If klogd isn’t running, data remains in the circular buffer until someone reads it or the buffer overflows.

Access The Calling Process (Current process)
Kernel code can refer to the current process by accessing the global item current, defined in <asm/current.h>, which yields a pointer to struct task_struct, defined by <linux/sched.h>. Actually, current is not truly a global variable. The need to support SMP systems forced the kernel developers to develop a mechanism that finds the current process on the relevant CPU. This mechanism must also be fast, since references to current happen frequently. The result is an architecture-dependent mechanism that, usually, hides a pointer to the task_struct structure on the kernel stack. The details of the implementation remain hidden to other kernel subsystems though, and a device driver can just include <linux/sched.h> and refer to the current process.

Dividing Module Responsibility (Module Stacking)
exporting symbols

Handling Errors During Initialization
Using The goto Statement

Using The Cleanup Function

Configurable Kernel Debuggin Options
CONFIG_DEBUG_KERNEL

CONFIG_DEBUG_SLAB

CONFIG_DEBUG_PAGEALLOC

CONFIG_DEBUG_SPINLOCK

CONFIG_DEBUG_SPINLOCK_SLEEP

CONFIG_INIT_DEBUG

CONFIG_DEBUG_INFO

CONFIG_MAGIC_SYSRQ

CONFIG_DEBUG_STACKOVERFLOW and CONFIG_DEBUG_STACK_USAGE

CONFIG_KALLSYMS

CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC

CONFIG_ACPI_DEBUG

CONFIG_DEBUG_DRIVER

CONFIG_SCSI_CONSTANTS

CONFIG_INPUT_EVBUG

CONFIG_PROFILING

Unnecessary Exit Memory
Prevent Allocating Exit Functions In Kernels That Disallow Unloading

Limited Stack
The kernel has a very small stack; it can be as small as a single, 4096-byte page. Your functions must share that stack with the entire kernel-space call chain. Thus, it is never a good idea to declare large automatic variables; if you need larger structures, you should allocate them dynamically at call time.

No Floating Point Arithmetic
Kernel code cannot do floating point arithmetic. Enabling floating point would require that the kernel save and restore the floating point processor’s state on each entry to, and exit from, kernel space—at least, on some architectures. Given that there really is no need for floating point in kernel code, the extra overhead is not worthwhile.

Concerrency
Module code becomes part of the kernel. Thus it is subject to the same Concurrency issues. Above the geenral concurrency sources, Linux has two other sources of concurrenct <ul> <li>In Linux: several software abstractions (e.g kernel times) run asynchronously</li> <li>In Linux 2.6.X: kernels are preemptable</li> </ul>

The Sleep Assumption in Linux

A common mistake made by driver programmers is to assume that concurrency is not a problem as long as a particular segment of code does not go to sleep (or “block”). Even in previous kernels (which were not preemptive), this assumption was not valid on multiprocessor systems. In 2.6, kernel code can (almost) never assume that it can hold the processor over a given stretch of code.

Kernel Debugging Options
Enable kernel debugging configuration options slows the system considerably and is not recommended for production kernels.

Contending for printk Messages
The /proc/kmesg file is a FIFO. You can read it by hand. Obviously, you can’t read messages this way if klogd or another process is already reading the same data, because you’ll contend for it.

Clobbering The System With printk Messages
If you want to avoid clobbering your system log with the monitoring messages from your driver, you can either specify the –f (file) option to klogd to instruct it to save messages to a specific file, or customize /etc/syslog.conf to suit your needs. Yet another possibility is to take the brute-force approach: kill klogd and verbosely print messages on an unused virtual terminal,* or issue the command cat /proc/kmsg from an unused xterm.

When using a slow console device (e.g., a serial port), an excessive message rate can also slow down the system or just make it unresponsive. The kernel has provided a function that can be helpful in such cases:

This function should be called before you consider printing a possibly repeating message. Only if it returns a non-zero value, should you go ahead and pint the message

The behaviour of this function can be changed using /proc/sys/kernel/printk_ratelimit (the number of seconds to wait before re enabling logging) and /proc/sys/kernel/printk_ratelimit_burst (the number of allowed messages before rate limiting)