After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.
next up previous contents index
Next: 4.7 Mapping addresses to Up: 4. Page Table Management Previous: 4.5 Allocating and Freeing   Contents   Index

Subsections


4.6 Kernel Page Tables

When the system first starts, paging is not enabled as page tables do not magically initialise themselves. Each architecture implements this differently so only the x86 case will be discussed. The page table initialisation is divided into two phases. The bootstrap phase sets up page tables for just 8MiB so the paging unit can be enabled. The second phase initialises the rest of the page tables. We discuss both of these phases below.


4.6.1 Bootstrapping

The assembler function startup_32() is responsible for enabling the paging unit in arch/i386/kernel/head.S. While all normal kernel code in vmlinuz is compiled with the base address at PAGE_OFFSET + 1MiB, the kernel is actually loaded beginning at the first megabyte (0x00100000) of memory4.2. The bootstrap code in this file treats 1MiB as its base address by subtracting __PAGE_OFFSET from any address until the paging unit is enabled so before the paging unit is enabled, a page table mapping has to be established which translates the 8MiB of physical memory at the beginning of physical memory to the correct place after PAGE_OFFSET.

Initialisation begins with statically defining at compile time an array called swapper_pg_dir which is placed using linker directives at 0x00101000. It then establishes page table entries for 2 pages, pg0 and pg1. As the Page Size Extension (PSE) bit is set in the cr4 register, pages translated are 4MiB pages, not 4KiB as is the normal case. The first pointers to pg0 and pg1 are placed to cover the region 1-9MiB and the second pointers to pg0 and pg1 are placed at PAGE_OFFSET+1MiB. This means that when paging is enabled, they will map to the correct pages using either physical or virtual addressing.

Once this mapping has been established, the paging unit is turned on by setting a bit in the cr0 register and a jump takes places immediately to ensure the Instruction Pointer (EIP register) is correct.

4.6.2 Finalising

The function responsible for finalising the page tables is called paging_init(). The call graph for this function on the x86 can be seen on Figure 4.4.

Figure 4.4: Call Graph: paging_init()
\includegraphics[width=10cm]{graphs/paging_init.ps}

For each pgd_t used by the kernel, the boot memory allocator (see Chapter 6) is called to allocate a page for the PMD. Similarly, a page will be allocated for each pmd_t allocator. If the CPU has the PSE flag available, it will be set to enable extended paging. This means that each page table entry in the kernel paging tables will be 4MiB instead of 4KiB. If the CPU supports the PGE flag, it also will be set so that the page table entry will be global. Lastly, the page tables from PKMAP_BASE are set up with the function fixrange_init(). Once the page table has been fully setup, the statically defined page table at swapper_pg_dir is loaded again into the cr3 register and the TLB is flushed.



Footnotes

... memory4.2
The first megabyte is used by some devices for communication with the BIOS and is skipped.

next up previous contents index
Next: 4.7 Mapping addresses to Up: 4. Page Table Management Previous: 4.5 Allocating and Freeing   Contents   Index
Mel 2004-02-15