There is a requirement for Linux to have a fast method of mapping virtual addresses to physical addresses and for mapping struct pages to their physical address. Linux achieves this by knowing where in both virtual and physical memory the global mem_map array is as the global array has pointers to all struct pages representing physical memory in the system. All architectures achieve this with similar mechanisms but for illustration purposes, we will only examine the x86 carefully. This section will first discuss how physical addresses are mapped to kernel virtual addresses and then what this means to the mem_map array.
As we saw in Section 4.6, Linux sets up a direct mapping from the physical address 0 to the virtual address PAGE_OFFSET at 3GiB on the x86. This means that on the x86, any virtual address can be translated to the physical address by simply subtracting PAGE_OFFSET which is essentially what the function virt_to_phys() with the macro __pa() does:
/* from <asm-i386/page.h> */ 132 #define __pa(x) ((unsigned long)(x)-PAGE_OFFSET) /* from <asm-i386/io.h> */ 76 static inline unsigned long virt_to_phys(volatile void * address) 77 { 78 return __pa(address); 79 }
Obviously the reverse operation involves simply adding PAGE_OFFSET which is carried out by the function phys_to_virt() with the macro __va(). Next we see how this helps the mapping of struct pages to physical addresses.
There is one exception where virt_to_phys() cannot be used to convert virtual addresses to physical ones4.3. Specifically, on the PPC and ARM architectures, virt_to_phys() cannot be used to convert address that have been returned by the function consistent_alloc(). consistent_alloc() is used on PPC and ARM architectures to return memory from non-cached for use with DMA.
As we saw in Section 4.6.1, the kernel image is located at the physical address 1MiB, which of course translates to the virtual address PAGE_OFFSET + 0x00100000 and a virtual region totaling about 8MiB is reserved for the image which is the region that can be addressed by two PGDs. This would imply that the first available memory to use is located at 0xC0800000 but that is not the case. Linux tries to reserve the first 16MiB of memory for ZONE_ DMA. This means the first virtual area used for kernel allocations is 0xC1000000 which is where the global mem_map is usually located. ZONE_ DMA will still get used, but only when absolutely necessary.
Physical addresses are translated to struct pages by treating them as an index into the mem_map array. Shifting a physical address PAGE_SHIFT bits to the right will treat it as a PFN from physical address 0 which is also an index within the mem_map array. This is exactly what the macro virt_to_page() does which is declared as follows in asm-i386/page.h:
#define virt_to_page(kaddr) (mem_map + (__pa(kaddr) >> PAGE_SHIFT))
virt_to_page() takes the virtual address kaddr, converts it to the physical address with __pa(), converts it into an array index by bit shifting it right PAGE_SHIFT bits and indexing into the mem_map by simply adding them together. No macro is available for converting struct pages to physical addresses but at this stage, it should be obvious to see how it could be calculated.
The mem_map area is created during system startup in one of two fashions. On NUMA systems, the function free_area_init_node() is called for each active node in the system and on UMA systems, free_area_init() is used. Both use the core function free_area_init_core() to perform the actual task of allocating memory for the mem_map portions and initialising the zones. Predictably, UMA calls the core function directly with contig_page_data and the global mem_map as parameters.
The core function free_area_init_core() allocates a local lmem_map for the node being initialised. The memory for the array is allocated from the boot memory allocator with alloc_bootmem_node() (see Chapter 6). With UMA architectures, this newly allocated memory becomes the global mem_map but it is slightly different for NUMA.
NUMA architectures allocate the memory for lmem_map within
their own memory node. The global mem_map never gets explicitly
allocated but instead is set to PAGE_OFFSET where it is
treated as a virtual array. The address of the local map is stored in
pg_data_tnode_mem_map which exists somewhere within
the virtual mem_map. For each zone that exists in the node, the
address within the virtual mem_map for the zone is stored in
zone_tzone_mem_map. All the rest of the code then
treats mem_map as a real array as only valid regions within it
will be used by nodes.