After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.
next up previous contents index
Next: 6.3 Allocating Memory Up: 6. Boot Memory Allocator Previous: 6.1 Representing the Boot   Contents   Index

Subsections

6.2 Initialising the Boot Memory Allocator

Each architecture is required to supply a setup_arch() function which, among other tasks, is responsible for acquiring the necessary parameters to initialise the boot memory allocator.

Each architecture has its own function to get the necessary parameters. On the x86, it is called setup_memory() but on other architectures such as MIPS or Sparc, it is called bootmem_init() or the case of the PPC, do_init_bootmem(). Regardless of the architecture, the tasks are essentially the same. The parameters it needs to calculate are:

min_low_pfn This is the lowest PFN that is available in the system;

max_low_pfn This is the highest PFN that may be addressed by low memory (ZONE_ NORMAL);

highstart_pfn This is the PFN of the beginning of high memory (ZONE_ HIGHMEM);

highend_pfn This is the last PFN in high memory;

max_pfn Finally, this is the last PFN available to the system.

6.2.1 Calculating The Size of Zones

Figure 6.1: Call Graph: setup_memory()
\includegraphics[width=17cm]{graphs/setup_memory.ps}

The PFN is an offset, counted in pages, within the physical memory map. The first PFN usable by the system, min_low_pfn is located at the beginning of the first page after _end which is the end of the loaded kernel image. The value is stored as a file scope variable in mm/bootmem.c for use with the boot memory allocator.

How the last page frame in the system, max_pfn, is calculated is quite architecture specific. In the x86 case, the function find_max_pfn() reads through the whole e8206.1 map for the highest page frame. The value is also stored as a file scope variable in mm/bootmem.c.

The value of max_low_pfn is calculated on the x86 with find_max_low_pfn() and it marks the end of ZONE_ NORMAL. This is the physical memory directly accessible by the kernel and is related to the kernel/userspace split in the linear address space marked by PAGE_OFFSET. The value, with the others, is stored in mm/bootmem.c. Note that in low memory machines, the max_pfn will be the same as the max_low_pfn.

With the three variables min_low_pfn, max_low_pfn and max_pfn, it is straightforward to calculate the start and end of high memory and place them as file scope variables in arch/i386/init.c as highstart_pfn and highend_pfn. The values are used later to initialise the high memory pages for the physical page allocator as we will see in Section 6.5.


6.2.2 Initialising bootmem_data

Once the limits of usable physical memory are known, one of two boot memory initialisation functions are selected and provided with the start and end PFN for the node to be initialised. init_bootmem(), which initialises contig_page_data, is used by UMA architectures, while init_bootmem_node() is for NUMA to initialise a specified node. Both function are trivial and rely on init_bootmem_core() to do the real work.

The first task of the core function is to insert this pgdat_data_t into the pgdat_list as at the end of this function, the node is ready for use. It then records the starting and end address for this node in its associated bootmem_data_t and allocates the bitmap representing page allocations. The size in bytes6.2 of the bitmap required is straightforward:


\begin{displaymath}\mathrm{mapsize} = \frac{(\mathrm{end\_pfn} - \mathrm{start\_pfn}) + 7}{8} \end{displaymath}

The bitmap in stored at the physical address pointed to by

bootmem_data_t$\rightarrow$node_boot_start and the virtual address to the map is placed in bootmem_data_t$\rightarrow$node_bootmem_map. As there is no architecture independent way to detect ``holes'' in memory, the entire bitmap is initialised to 1, effectively marking all pages allocated. It is up to the architecture dependent code to set the bits of usable pages to 0. In the case of the x86, the function register_bootmem_low_pages() reads through the e820 map and calls free_bootmem() for each usable page to set the bit to 0 before using reserve_bootmem() to reserve the pages needed by the actual bitmap.



Footnotes

...e8206.1
e820 is a table provided by the BIOS describing what physical memory is available, reserved or non-existent.
... bytes6.2
Hence the division by 8.

next up previous contents index
Next: 6.3 Allocating Memory Up: 6. Boot Memory Allocator Previous: 6.1 Representing the Boot   Contents   Index
Mel 2004-02-15