The K-Zone: Understanding the Linux boot process

This document explains in moderate detail what happens when a Linux system starts up. As far as possible, I have tried to separate features which are specific to the various Linux distributions from those that are generic. Where this isn't possible -- because the explanation would be too convoluted -- I have used the RedHat set-up as an example. In addition, I have tended to focus on the Intel/PC platform, for the same reason.

To break the process into manageable pieces, I have broken it into four stages: the `firmware' stage, the `bootloader' stage, the `kernel' stage, and the `init' stage. These are my names, and they aren't necessarily used by other Linux users. Moreover, it isn't always easy to separate the `firmware' stage from the initial operations of the bootloader. On the PC platform, the firmware is so unintelligent that a separate (software) bootloader is required. On other platforms, notably Sparc machines, the firmware is quite sophisticated, and may be able to load a kernel directly.

Stage 1 (firmware stage)

The purpose of a bootloader is to get at least part of the operating system kernel into memory and running. After that, the kernel can take over the process. However, unless the bootloader is in firmware, to run the bootloader we must first retrieve it, from disk or wherever else it is stored. The purpose of the firmware stage, therefore, is to get a bootloader into memory and run it.

On the Intel/PC platform, the firmware stage (which does not depend on the operating system) is governed by the BIOS. Most modern PCs (and other types of computer, of course) can boot from floppy disk, hard disk, or CD-ROM. It is common for Sparc-based systems to have built-in network bootloaders in firmware but, at present, this is unusual in the PC world. The BIOS typically provides a mechanism by which the operator can choose the devices that will be used to boot, and it will probably be prepared to try more than one if necessary. The process is slightly different for the different media types.

Bootloader on floppy disk or hard disk

This is usually the simplest situation. On a floppy disk, the first sector is reserved as the boot sector. It must contain executable program code. The BIOS loads the boot sector into memory and then runs it. This process is largely the same whatever the hardware platform.

The situation is similar for PC hard disks, except that it is conventional to divide the hard disk into partitions, and to provide a boot sector for each partition. In the world of DOS, the boot sector was, and remains, combined with the partition table; the partition table controls how much space is allocated to each partition. In addition to the partition boot sectors there is an overall boot sector/partition table called the `master boot record' (MBR). When booting from a hard disk formatted this way, the PC BIOS loads the MBR and executes it as a boot sector; the code in the MBR will then find which partition to boot from, and load and run the boot sector from that partition.

Linux has no need to follow the convention of partitioning that is meaningful to DOS/Windows, but if the hard disk is to be used with more than one operating system then it is a good idea to.

So, when booting from a hard disk the Linux bootloader can be placed in the MBR, or in a partition boot sector. In the latter case, it won't be the BIOS that will load the Linux bootloader, it will be the bootloader on the master boot record.

Whether the boot disk is a hard disk or a floppy disk, the first stage of the boot process finds a boot sector, which will contain the Linux bootloader, and runs it.

Bootloader on CD-ROM

The ability to boot from a CDROM has been commonplace on most platforms for some years. On some platforms a bootable CDROM has the same structure as a bootable hard disk: a boot sector followed by a load of data. A structure like this is unworkable for PCs, owing to limitations in the BIOS specification. Most modern PCs are, however, able to boot from a CDROM formatted according to the El Torito specification. This process is far more complex than it ought to be. Because the BIOS can't cope with a full-sized bootable Linux filesystem on a CDROM, El Torito requires that the CDROM be provided with an additional bootable filesystem. This filesystem is considered to be `outside' the normal data area of the CDROM, and won't be visible if the CDROM is mounted as a filesystem in the usual way. In fact, although the CDROM itself will normally be formatted with an ISO9660 filesystem, the El Torito bootable image can be of any filesystem type. In practise, the bootable image will be formatted as a floppy disk: a boot sector followed by a filesystem. When booting from the CDROM, the BIOS finds the bootable filesystem image, loads the boot sector, and makes the rest of the image available through BIOS calls just as it does for a floppy disk. As far as the bootloader is concerned, therefore, the BIOS treats a bootable CDROM as an ordinary CDROM with an `embedded' bootable floppy disk. Booting from CDROM is therefore just like booting from a floppy disk in practise. With Linux, this embedded floppy disk is usually formatted with an ext2 filesystem. As with a floppy disk, this filesystem will either become the root filesystem for the next phase of the boot process, or will supply a new, compressed filesystem which will be loaded into memory as a `ramdisk' (see below).

The diagram below shows the structure of a typical Linux bootable CD-ROM (but this isn't the only way to do it). The areas aren't to scale, of course: the volume descriptors, etc., are only one sector in length, but the filesystems will be many thousands of sectors. Notice that there is a complete ext2 filesystem in the boot filesystem image, along with the boot sector. The boot sector will normally contain LILO code (see below). The filesystem contains the kernel and the initial ramdisk (see below), and the initial ramdisk in turn contains an ext2 filesystem which will become the root filesystem.

Bootloader retrieved from network

The problem with booting from a network is that the functionality must be supplied in firmware, because if there is no hard disk, there is no practical place to load network-boot software from. Most PCs do not contain firmware this sophisticated, although some network adaptors have this functionality. Sparc-based workstations generally do have network boot functionality -- in the OpenBoot firmware, and it is quite comprehensive. Note that there is nothing to stop a PC getting a bootloader with network capabilities from, say, a hard disk or CDROM and then using this to complete the boot process over the network. However, this is not network booting in the sense I am describing here.

To get a bootloader via the network, the workstation must first of all decide where to get it from. This may be configurable at the firmware level or, more often, the workstation will issue a broadcast, and then select a boot server from the replies. Sun Sparc systems typically make a RARP request, broadcasting their hardware MAC address (`Ethernet address'). The reply from the server will contain the IP number assigned to the workstation, and that of the server itself. The workstation then uses the server's IP as the target for a TFTP download. Whether this download retrieves a network-aware bootloader, or a whole kernel, varies from one system to another. Some Sparc systems are able to TFTP a Linux kernel and load it, other require the retrieval of a network-aware bootloader which then retrieves the kernel (this is how Linux can be made to run on the Sun Javastation network appliance, which has somewhat stunted firmware).

Stage 2 (bootloader stage)

So we've got a bootloader into memory, from disk or network, and it can be executed. Its job will be to get the kernel into memory, again either from disk or network, and execute it. The bootloader will have to supply various vital pieces of information to the kernel, crucially the location of its root filesystem.

There are a number of bootloaders available for Linux: on the Intel/PC platform we have LILO and GRUB; on Sparc we have SILO. LILO is probably the best known, and has existed since the earliest days of Linux. SILO is essentially the Sparc port of LILO. GRUB is a much more sophisticated proposition.

LILO

LILO is a very rudimentary, single-stage bootloader. It has little or no knowledge of Linux, and does not understand the structure of any filesystem. Instead, it reads from the disk using BIOS calls, supplying numerical values for the locations on disk of the files it needs. Where does it get these values from? It has no way to figure them out at run-time, so the LILO installer has to supply them in the form of a `map' file. The LILO installer is a utility called lilo; this utility reads a configuration file and builds the map file from it. The location of the map file is then supplied to the boot sector that lilo installs.

The bootloading process with LILO thus looks something like this.

A problem with LILO is that it can be quite tricky to use it for creating a boot sector for a system different to the one running the LILO installer (lilo). The LILO configuration file (usually /etc/lilo.conf) takes the names of files and devices as its inputs, but these names are never passed through to the boot sector being created. The files and devices referenced are simply analysed for their numerical offsets. For example, if lilo.conf contains the line
root=/dev/cdrom
and /dev/cdrom is a symbolic link to the real device file (perhaps /dev/hdc), it is important to understand that all lilo will store is the major and minor device identifiers of /dev/hdc. It is easy to imagine that if the bootable filesystem you are building contains a file called /dev/cdrom, and that is a link to, say, /dev/hdd, then the root filesystem will be found on /dev/hdd. But it won't; LILO does not understand filesystems, and the names in the configuration file are simply rendered down to device IDs and file sector locations.

GRUB

GRUB is a very different bootloader from LILO. It has a two-stage or three-stage operation, and has network boot capabilities (of course, the network boot facilities don't give you a way to get GRUB itself loaded: you'll still need network boot firmware).

The additional sophistication of GRUB means that it can't easily fit into a single boot sector. It therefore uses a multiple-stage process to load successively larger amounts into memory. In so doing it becomes able to understand filesystems, so the kernel itself, and the other files GRUB uses, can be specified dynamically at boot time; there is no need for explicit numerical maps such as the ones that LILO uses.

In brief, the GRUB boot process looks like this.

The functionality offered by GRUB is quite similar to the OpenBoot firmware in Sun workstations, and includes the ability to retrieve kernels from a server using TFTP.

Multiple-boot machines

Because Linux was designed to be able to co-exist with other operating systems, the bootloader should be able to boot other operating systems on a hard disk as well as Linux. In practise this is relatively straightforward, as each of the other operating systems will have its own boot sector. All the Linux boot loader has to do is to locate the appropriate boot sector, and execute it. After that, the process will be under the control of the other system's bootloader. LILO, GRUB, and SILO all have this functionality.

Stage 3 (kernel stage)

By the time this stage begins, the bootloader will have loaded the kernel into memory, configured it with the location of its root filesystem, and loaded the initial ramdisk, if supplied. How we proceed from here depends to a large extent on whether we are using an initial ramdisk or not.

So why is an initial ramdisk such a big deal? Well, the concept arose from attempts to solve the problem of fitting a fully bootable Linux system onto a single floppy disk. The problem is that a Linux system that will boot as far as giving a shell, and offering a few basic utilities, needs about 8Mb -- far too much to fit onto a floppy. However, such a system will in practise compress down to about 2 Mb using gzip compression, so if the root filesystem could be compressed, we could get a working system in two standard floppies, or a single 2.88 Mb floppy.

Another problem that had to be solved was that of booting from a floppy disk and then mounting a root filesystem from a device other than an IDE drive. SCSI drives were particularly problematic: if the kernel was compiled to included all the necessary drivers, it would not fit onto a floppy disk. However, the initial ramdisk technique allows the drivers to be supplied as loadable modules, which can be compressed.

In outline, an initial ramdisk is a root filesystem that is unpacked from a compressed file. The boot loader will load the compressed version into memory, then the kernel uncompresses it and mounts it as the root filesystem. In this way we can get an 8 Mb root filesystem onto a 2.88 Mb file. Initial ramdisks are also useful on bootable CDROMs, because the bootable part of the CDROM is typically implemented as an `embedded' floppy disk.

Stage 3a (common kernel stage)

Whether or not we are using an initial ramdisk, the kernel will begin initializing itself and the hardware devices for which support is compiled in. The process will typically include the following steps. If we aren't using an initial ramdisk, then the next step is to mount the root filesystem. The kernel can then run the first true process from the root filesystem (strictly speaking, kswapd and its associates are not processes, they are kernel threads). Conventionally this process is /sbin/init, although the choice can be overridden by supplying the boot= parameter to the kernel at boot time. The init process runs with uid zero (i.e., as root) and will be the parent of all other processes.

Note that kswapd and the other kernel threads have process IDs but, even though they start before init, init still has process ID 1. This is to maintain the Unix convention that init is the first process.

Stage 3b (ramdisk kernel stage)

This stage is only relevant if we are using an initial ramdisk. In this case, the kernel won't involve init, but will proceed as follows. /linuxrc need not mount a new root filesystem over the top of the ramdisk root, nor need it load init. These activities are simply conventions. For example, in order to boot a full Linux system from a CDROM, a workable proposition is to retain the initial ramdisk as the root filesystem, and have /linuxrc mount the CDROM at, say, /usr. This allows the root filesystem to be read-write; if we mounted the CDROM at /, the root filesystem would be read-only, and we would have to create a separate ramdisk and have a bunch of symbolic links from the CDROM to parts of that ramdisk.

Similarly, a `rescue' disk -- floppy or CDROM -- would probably not want to invoke init, but simply put up a root shell.

If we are using /linuxrc to prepare a root filesystem, it is a good idea to minimize the amount of initialization code in it. This is not because it won't work, but because the correct place for initialization is in the start-up script spawned by init. Doing initialization here, and not in /linuxrc enables us to ensure that the same initialization code is available whether or not an initial ramdisk is in use.

Stage 4 (init stage)

By now the kernel is loaded, memory management is running, some hardware is initialized, and the root filesystem is in place. All subsequent operations are invoked -- directly or indirectly by init. This process takes its instructions -- again by default -- from the file /etc/inittab. inittab specifies at least three important pieces of information. The order of operations is that the initialization command (rc.sysinit) is run first, then the runlevel scripts. The division of work between rc.sysinit and the runlevel scripts is entirely a convention. If you are building a custom Linux system you don't have to follow this convention. In fact, you don't even have to run init if it doesn't do what you need.

Stage 4a (rc.sysinit)

This script or executable is responsible for all the one-off initialization of the system. Linux distributions differ in the distribution of work between this script and the runlevel scripts but, in general, the following initialization steps are likely to be carried out here.

Stage 4b (runlevel scripts)

Let's assume that we will be entering runlevel 5 which, by convention, gives us a graphical login prompt under the X server. A typical inittab will have entries like this:
l5:5:wait:/etc/rc.d/rc 5
x:5:respawn:/etc/X11/prefdm -nodaemon
The first line says that on entry to runlevel 5, invoke a script called rc, passing the argument `5'. The second line says that on entry to runlevel 5, run the script /etc/X11/prefdm -nodaemon. This latter script is somewhat beyond the scope of this article, being in the realm of X display management. In outline, prefdm is a script inserted by the RedHat installer. It contains code that will launch the X display manager selected by the user, either at install time or using a configuration utility. The reason it works this way is so that configuration utilities don't have to mess about with inittab, which is a bad file to mess up if you want your system to keep working. The X display manager will typically invoke the X server (i.e., the graphical display) on the local machine and give you a login prompt.

But back to the `real' boot process... The script rc runs the start scripts in a directory for the runlevel given in inittab. Usually, runlevel N will correspond to a directory /etc/rc.d/rcN.d. As we've decided to enter runlevel 5, the relevant directory is /etc/rc.d/rc5.d. This directory will contain a (possibly large) number of scripts with names beginning with `S' or `K' followed by two digits, e.g., S12syslog. The digits denote the order in which the scripts are executed: The `S' scripts are executed in ascending numerical order on entry to the runlevel (i.e., at boot), and the `K' scripts are executed in descending order on exit (usually at shutdown). rc passes the argument `start' to each script at startup, and `stop' and shutdown. As a result, we don't really need both `S' and `K' scripts, because we can use the argument to determine whether we are starting or stopping. Thus it is a convention on Linux systems that the K scripts are simply symbolic links to their corresponding S scripts, and the S scripts do both startup and shutdown operations.

So, for example, when entering runlevel 5, somewhere near the beginning of the rc process we will execute

S12syslog start
On shutdown, somewhere towards the end of the shutdown process we will do
K12syslog stop
which is, in fact, an invocation of
S12syslog stop
Inside the script S12syslog -- and most of the other scripts in that directory -- you will find both initialization and finalization code. So what do these scripts do? Well, this depends on the runlevel, and the distribution, and any customizations you have made. A typical set of operations will included the following: The very last step in the boot process will be to run a script S99local. This is the conventional place to put machine-specific initialization. It is considered bad manners to customize any of the initialization scripts that are supplied as part of a Linux distribution, simply because other people who may have to manage the system will have expectations about what is in them. Making arbitrary changes here will defeat these expectations. However, everybody expects to see machine-specific configuration in S99local.

Gotchas

It should be clear that the boot process on a fully-featured Linux system is fairly complex. You can simplify it a great deal if you are building a custom Linux system, or if you just want your machine to start up faster. However, there are a few things to watch out for when constructing a custom boot process.
©1994-2006 Kevin Boone, all rights reserved