After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.
next up previous contents index
Next: 1.1 General Kernel Literature Up: understand-html Previous: Acknowledgments   Contents   Index


1. Introduction

Linux is a relatively new operating system that has begun to enjoy a lot of attention from the business and academic worlds. As the operating system matures, its feature set, capabilities and performance grow but so does its size and complexity. The table in Figure 1.1 shows the size of the kernel source code and size in bytes and lines of code of the mm/ part of the kernel tree. This does not include the machine dependent code or any of the buffer management code and does not even pretend to be an accurate metric for complexity but still serves as a small indicator.


Table 1.1: Kernel size as an indicator of complexity
1.0 March 13th, 1992 5.9MiB 96KiB 3109
1.2.13 February 8th, 1995 11MiB 136KiB 4531
2.0.39 January 9th 2001 35MiB 204KiB 6792
2.2.22 September 16th, 2002 93MiB 292KiB 9554
2.4.20 November 28th, 2002 167MiB 520KiB 15428


As is the habit of open source project developers in general, new developers asking questions are often told to find their answer directly from the source or are advised to ask on the mailing list for beginner developers (http://www.kernelnewbies.org). With the Linux Virtual Memory (VM) manager, this was a suitable response for earlier kernels as the time required to understand the VM could be measured in weeks. The books available on the operating system devoted enough time to the memory management chapters to make the relatively small amount of code easy to navigate. This is no longer the case.

The books that describe Linux's internals [#!bovet00!#] [#!bovet03!#], tend to cover the entire kernel rather than one topic with the notable exception of device drivers [#!rubini01!#]. These books, particularly Understanding the Linux Kernel, provide invaluable insight into kernel internals but they miss the details which are specific to the VM and not of general interest.

Increasingly, to get a comprehensive view on how the kernel functions, the developer or researcher is required to read through the source code line by line which requires a large investment of time. This is especially true as the implementations of several VM algorithms diverge considerably from the papers that describe them.

In this thesis, a comprehensive guide to the VM as implemented in the 2.4.20 kernel is presented. In addition to an introduction to the theoretical background and verbal description of the implementation, a companion document called Code Commentary On The Linux Virtual Memory Manager, hereafter referred to as the companion document, provides a line-by-line tour of the code. It is envisioned that with this pair of documents, the time required to have a clear understanding of the VM, even later VMs, will be measured in weeks instead of the estimated 8 months currently required by even an experienced developer.

The VM-specific documentation that exists today is relatively poor. It is not an area of the kernel that many wish to get involved in for a variety of reasons ranging from the amount of code involved, to the complexity of the subject of memory management to the difficulty of debugging the kernel with an unstable VM.



Subsections
next up previous contents index
Next: 1.1 General Kernel Literature Up: understand-html Previous: Acknowledgments   Contents   Index
Mel 2004-02-15