After this documentation was released in July 2003, I was approached
by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.
The book is available and called simply "Understanding The Linux Virtual
Memory Manager". There is a lot of additional material in the book that is
not available here, including details on later 2.4 kernels, introductions
to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB
management, a lot more code commentary, countless other additions and
clarifications and a CD with lots of cool stuff on it. This material (although
now dated and lacking in comparison to the book) will remain available
although I obviously encourge you to buy the book from your favourite book
store :-) . As the book is under the Bruce Perens Open Book Series, it will
be available 90 days after appearing on the book shelves which means it
is not available right now. When it is available, it will be downloadable
from http://www.phptr.com/perens
so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 6.4 Freeing Memory
Up: 6. Boot Memory Allocator
Previous: 6.2 Initialising the Boot
  Contents
  Index
The reserve_bootmem() function may be used to reserve
pages for use by the caller but is very cumbersome to use for
general allocations. There are four functions provided for easy
allocations on UMA architectures called alloc_bootmem(),
alloc_bootmem_low(), alloc_bootmem_pages() and
alloc_bootmem_low_pages() which are fully described in Table
6.1. All of these
macros call __alloc_bootmem() with different parameters. See
the call graph in Figure 6.2.
Figure 6.2:
Call Graph: __alloc_bootmem()
|
Similar functions exist for NUMA which take the node as an
additional parameter, as listed in Table 6.2. They are called
alloc_bootmem_node(), alloc_bootmem_pages_node()
and alloc_bootmem_low_pages_node(). All of these macros
call __alloc_bootmem_node() with different parameters.
The parameters to either __alloc_bootmem() and
__alloc_bootmem_node() are essentially the same. They are
- pgdat This is the node to allocate from. It is omitted in the UMA
case as it is assumed to be contig_page_data;
- size This is the size in bytes of the requested allocation;
- align This is the number of bytes that the request should be aligned
to. For small allocations, they are aligned to SMP_CACHE_BYTES,
which on the x86 will align to the L1 hardware cache;
- goal This is the preferred starting address to begin allocating
from. The ``low'' functions will start from physical address 0 where as the
others will begin from MAX_DMA_ADDRESS which is the maximum
address DMA transfers may be made from on this architecture.
The core function for all the allocation APIs is
__alloc_bootmem_core(). It is a large function but with
simple steps that can be broken down. The function linearly scans memory
starting from the goal address for a block of memory large enough
to satisfy the allocation. With the API, this address will either be 0 for
DMA-friendly allocations or MAX_DMA_ADDRESS otherwise.
The clever part, and the main bulk of the function, deals with deciding
if this new allocation can be merged with the previous one. It may be merged
if the following conditions hold:
- The page used for the previous allocation
(bootmem_datapos) is adjacent to the page found for this
allocation;
- The previous page has some free space in it
(bootmem_dataoffset != 0);
- The alignment is less than PAGE_SIZE.
Regardless of whether the allocations may be merged or not, the pos
and offset fields will be updated to show the last page used
for allocating and how much of the last page was used. If the last page was
fully used, the offset is 0.
Next: 6.4 Freeing Memory
Up: 6. Boot Memory Allocator
Previous: 6.2 Initialising the Boot
  Contents
  Index
Mel
2004-02-15