The field of memory management is large, complex, time consuming to research and difficult to apply to practical implementations. This is partially related to the difficulty of modeling how systems behave in real multi-programmed systems [#!coffman80!#] which has resulted in theoretical examination of virtual memory algorithms often depending on simulations of specific workloads. Simulations are necessary as modeling how scheduling, paging behavior and multiple processes interact presents a considerable challenge. Page replacement policies, a field that has been the focus of considerable amounts of research, is a good example as it is only ever shown to work well for specified workloads. The problem of adjusting algorithms and policies to different workloads is addressed by having administrators tune systems as much as by research and algorithms.
Linux is also large, complex and fully understood by a relatively small core group of people. Its development is the result of contributions of thousands of programmers with a varying range of specialties, backgrounds and spare time. The first implementations are developed based on the all-important foundation that theory provides. Contributors built upon this framework with changes based on real world observations but unfortunately the best available documentation [#!bovet00!#] of the final implementation tries to summarise the entire kernel without giving specific focus to any area.
It has been often asserted on the Linux Memory Management mailing list that the VM is poorly documented and difficult to pick up as ``the implementation is a nightmare to follow''14.1and the lack of documentation on practical VMs is not just confined to Linux. Matt Dillon, one of the principal developers of the FreeBSD VM14.2and considered a ``VM Guru'' stated in an interview14.3 that documentation can be ``hard to come by''. One of the principal difficulties with deciphering the implementation is the fact the developer must have a background in memory management theory to see why implementation decisions were made as a pure understanding of the code is insufficient for any purpose other than micro-optimisations.
This thesis attempts to bridge the gap between memory management theory and the practical implementation in Linux and tie both fields together into a single body of work. I believe this thesis is the most comprehensive documentation of the Linux VM to date and is a rare attempt to bind memory management theory and practice together. Rather than providing a general view of the kernel, it has been presented from the perspective of the VM in a manner that is relatively independent of hardware architecture considerations.
A future direction for this research includes the documentation of the, as yet unreleased, version 2.6 VM and the development of VM Regress as a regression, benchmarking and analysis tool for which a framework is still being developed. With this document as a starting point, it is envisioned that the version 2.6 VM can be documented as a series of addenda documents describing the differences between version 2.4.20 and version 2.6 without the necessity of presenting the entire VM as this document has done. On a personal note, I hope that this document encourages other researches to produce similar documents for other subsystems in the kernel.