After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.
next up previous contents index
Next: 5. Process Address Space Up: 4. Page Table Management Previous: 4.6 Kernel Page Tables   Contents   Index

Subsections


4.7 Mapping addresses to struct pages

There is a requirement for Linux to have a fast method of mapping virtual addresses to physical addresses and for mapping struct pages to their physical address. Linux achieves this by knowing where in both virtual and physical memory the global mem_map array is as the global array has pointers to all struct pages representing physical memory in the system. All architectures achieve this with similar mechanisms but for illustration purposes, we will only examine the x86 carefully. This section will first discuss how physical addresses are mapped to kernel virtual addresses and then what this means to the mem_map array.


4.7.1 Mapping Physical to Virtual Kernel Addresses

As we saw in Section 4.6, Linux sets up a direct mapping from the physical address 0 to the virtual address PAGE_OFFSET at 3GiB on the x86. This means that on the x86, any virtual address can be translated to the physical address by simply subtracting PAGE_OFFSET which is essentially what the function virt_to_phys() with the macro __pa() does:

/* from <asm-i386/page.h> */
132 #define __pa(x)                 ((unsigned long)(x)-PAGE_OFFSET)
/* from <asm-i386/io.h> */
 76 static inline unsigned long virt_to_phys(volatile void * address)
 77 {
 78         return __pa(address);
 79 }

Obviously the reverse operation involves simply adding PAGE_OFFSET which is carried out by the function phys_to_virt() with the macro __va(). Next we see how this helps the mapping of struct pages to physical addresses.

There is one exception where virt_to_phys() cannot be used to convert virtual addresses to physical ones4.3. Specifically, on the PPC and ARM architectures, virt_to_phys() cannot be used to convert address that have been returned by the function consistent_alloc(). consistent_alloc() is used on PPC and ARM architectures to return memory from non-cached for use with DMA.

4.7.2 Mapping struct pages to Physical Addresses

As we saw in Section 4.6.1, the kernel image is located at the physical address 1MiB, which of course translates to the virtual address PAGE_OFFSET + 0x00100000 and a virtual region totaling about 8MiB is reserved for the image which is the region that can be addressed by two PGDs. This would imply that the first available memory to use is located at 0xC0800000 but that is not the case. Linux tries to reserve the first 16MiB of memory for ZONE_ DMA. This means the first virtual area used for kernel allocations is 0xC1000000 which is where the global mem_map is usually located. ZONE_ DMA will still get used, but only when absolutely necessary.

Physical addresses are translated to struct pages by treating them as an index into the mem_map array. Shifting a physical address PAGE_SHIFT bits to the right will treat it as a PFN from physical address 0 which is also an index within the mem_map array. This is exactly what the macro virt_to_page() does which is declared as follows in $<$asm-i386/page.h$>$:

#define virt_to_page(kaddr) (mem_map + (__pa(kaddr) >> PAGE_SHIFT))

virt_to_page() takes the virtual address kaddr, converts it to the physical address with __pa(), converts it into an array index by bit shifting it right PAGE_SHIFT bits and indexing into the mem_map by simply adding them together. No macro is available for converting struct pages to physical addresses but at this stage, it should be obvious to see how it could be calculated.


4.7.3 Initialising mem_map

The mem_map area is created during system startup in one of two fashions. On NUMA systems, the function free_area_init_node() is called for each active node in the system and on UMA systems, free_area_init() is used. Both use the core function free_area_init_core() to perform the actual task of allocating memory for the mem_map portions and initialising the zones. Predictably, UMA calls the core function directly with contig_page_data and the global mem_map as parameters.

The core function free_area_init_core() allocates a local lmem_map for the node being initialised. The memory for the array is allocated from the boot memory allocator with alloc_bootmem_node() (see Chapter 6). With UMA architectures, this newly allocated memory becomes the global mem_map but it is slightly different for NUMA.

NUMA architectures allocate the memory for lmem_map within their own memory node. The global mem_map never gets explicitly allocated but instead is set to PAGE_OFFSET where it is treated as a virtual array. The address of the local map is stored in pg_data_t$\rightarrow$node_mem_map which exists somewhere within the virtual mem_map. For each zone that exists in the node, the address within the virtual mem_map for the zone is stored in zone_t$\rightarrow$zone_mem_map. All the rest of the code then treats mem_map as a real array as only valid regions within it will be used by nodes.


Footnotes

... ones4.3
This tricky issue was pointed out to me by Jeffrey Haran.

next up previous contents index
Next: 5. Process Address Space Up: 4. Page Table Management Previous: 4.6 Kernel Page Tables   Contents   Index
Mel 2004-02-15