After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.
next up previous contents index
Next: 4.3 Using Page Table Up: 4. Page Table Management Previous: 4.1 Describing the Page   Contents   Index


4.2 Describing a Page Table Entry

As mentioned, each entry is described by the structures pte_t, pmd_t and pgd_t for PTEs, PMDs and PGDs respectively. Even though these are often just unsigned integers, they are defined as structures for two reasons. The first is for type protection so that they will not be used inappropriately. The second is for features like PAE on the x86 where an additional 4 bits is used for addressing more than 4GiB of memory. To store the protection bits, pgprot_t is defined which holds the relevant flags and is usually stored in the lower bits of a page table entry.

For type casting, 4 macros are provided in asm/page.h, which takes the above types and returns the relevant part of the structures. They are pte_val(), pmd_val(), pgd_val() and pgprot_val(). To reverse the type casting, 4 more macros are provided __pte(), __pmd(), __pgd() and __pgprot().

Where exactly the protection bits are stored is architecture dependent. For illustration purposes, we will examine the case of an x86 architecture without PAE enabled but the same principles apply across architectures. On an x86 with no PAE, the pte_t is simply a 32 bit integer within a struct. Each pte_t points to an address of a page frame and all the addresses pointed to are guaranteed to be page aligned. Therefore, there are PAGE_SHIFT (12) bits in that 32 bit value that are free for status bits of the page table entry. A number of the protection and status bits are listed in Table 4.1 but what bits exist and what they mean varies between architectures.


Table 4.1: Page Table Entry Protection and Status Bits
\begin{table}\begin{center}
\begin{tabularx}{13.5cm}{\vert l\vert X\vert}
\pa...
...page is accessed \\
\par\hline
\par
\end{tabularx}
\end{center} \end{table}


These bits are self-explanatory except for the _PAGE_PROTNONE which we will discuss further. On the x86 with Pentium III and higher, this bit is called the Page Attribute Table (PAT)4.1and is used to indicate the size of the page the PTE is referencing. In a PGD entry, this same bit is the PSE bit so obviously these bits are meant to be used in conjunction.

As Linux does not use the PSE bit, the PAT bit is free in the PTE for other purposes. There is a requirement for having a page resident in memory but inaccessible to the userspace process such as when a region is protected with mprotect() with the PROT_NONE flag. When the region is to be protected, the _PAGE_PRESENT bit is cleared and the _PAGE_PROTNONE bit is set. The macro pte_present() checks if either of these bits are set and so the kernel itself knows the PTE is present, just inaccessible to userspace which is a subtle, but important point. As the hardware bit _PAGE_PRESENT is clear, a page fault will occur if the page is accessed so Linux can enforce the protection while still knowing the page is resident if it needs to swap it out or the process exits.



Footnotes

... (PAT)4.1
With earlier architectures such as the Pentium II, this bit was simply reserved.

next up previous contents index
Next: 4.3 Using Page Table Up: 4. Page Table Management Previous: 4.1 Describing the Page   Contents   Index
Mel 2004-02-15