After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.

Next: 6.4 Freeing Memory Up: 6. Boot Memory Allocator Previous: 6.2 Initialising the Boot Contents Index

6.3 Allocating Memory

The reserve_bootmem() function may be used to reserve pages for use by the caller but is very cumbersome to use for general allocations. There are four functions provided for easy allocations on UMA architectures called alloc_bootmem(), alloc_bootmem_low(), alloc_bootmem_pages() and alloc_bootmem_low_pages() which are fully described in Table 6.1. All of these macros call __alloc_bootmem() with different parameters. See the call graph in Figure 6.2.

**Figure 6.2:** Call Graph: `__alloc_bootmem()`
$\includegraphics[width=17cm]{graphs/__alloc_bootmem.ps}$

Similar functions exist for NUMA which take the node as an additional parameter, as listed in Table 6.2. They are called alloc_bootmem_node(), alloc_bootmem_pages_node() and alloc_bootmem_low_pages_node(). All of these macros call __alloc_bootmem_node() with different parameters.

The parameters to either __alloc_bootmem() and __alloc_bootmem_node() are essentially the same. They are

: pgdat This is the node to allocate from. It is omitted in the UMA case as it is assumed to be contig_page_data;
: size This is the size in bytes of the requested allocation;
: align This is the number of bytes that the request should be aligned to. For small allocations, they are aligned to SMP_CACHE_BYTES, which on the x86 will align to the L1 hardware cache;
: goal This is the preferred starting address to begin allocating from. The ``low'' functions will start from physical address 0 where as the others will begin from MAX_DMA_ADDRESS which is the maximum address DMA transfers may be made from on this architecture.

The core function for all the allocation APIs is __alloc_bootmem_core(). It is a large function but with simple steps that can be broken down. The function linearly scans memory starting from the goal address for a block of memory large enough to satisfy the allocation. With the API, this address will either be 0 for DMA-friendly allocations or MAX_DMA_ADDRESS otherwise.

The clever part, and the main bulk of the function, deals with deciding if this new allocation can be merged with the previous one. It may be merged if the following conditions hold:

The page used for the previous allocation (bootmem_data $\rightarrow$ pos) is adjacent to the page found for this allocation;
The previous page has some free space in it (bootmem_data $\rightarrow$ offset != 0);
The alignment is less than PAGE_SIZE.

Regardless of whether the allocations may be merged or not, the pos and offset fields will be updated to show the last page used for allocating and how much of the last page was used. If the last page was fully used, the offset is 0.

Next: 6.4 Freeing Memory Up: 6. Boot Memory Allocator Previous: 6.2 Initialising the Boot Contents Index

Mel 2004-02-15