After this documentation was released in July 2003, I was approached
by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.
The book is available and called simply "Understanding The Linux Virtual
Memory Manager". There is a lot of additional material in the book that is
not available here, including details on later 2.4 kernels, introductions
to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB
management, a lot more code commentary, countless other additions and
clarifications and a CD with lots of cool stuff on it. This material (although
now dated and lacking in comparison to the book) will remain available
although I obviously encourge you to buy the book from your favourite book
store :-) . As the book is under the Bruce Perens Open Book Series, it will
be available 90 days after appearing on the book shelves which means it
is not available right now. When it is available, it will be downloadable
from http://www.phptr.com/perens
so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 4.4 Translating and Setting
Up: 4. Page Table Management
Previous: 4.2 Describing a Page
  Contents
  Index
Macros are defined in asm/pgtable.h which are important for the
navigation and examination of page table entries. To navigate the page
directories, three macros are provided which break up a linear address
space into its component parts. pgd_offset() takes an address and
the mm_struct for the process and returns the PGD entry that
covers the requested address. pmd_offset() takes a PGD entry and
an address and returns the relevant PMD. pte_offset() takes a PMD
and returns the relevant PTE. The remainder of the linear address provided
is the offset within the page. The relationship between these fields is
illustrated in Figure 4.3
Figure 4.3:
Page Table Layout
|
The second round of macros determine if the page table entries are present or
may be used.
- pte_none(), pmd_none() and pgd_none()
return 1 if the corresponding entry does not exist;
- pte_present(), pmd_present() and
pgd_present() return 1 if the corresponding page table
entries have the PRESENT bit set;
- pte_clear(), pmd_clear() and pgd_clear()
will clear the corresponding page table entry;
- pmd_bad() and pgd_bad() are used to check entries
when passed as input parameters to functions that may change the
value of the entries. Whether it returns 1 varies between the few
architectures that define these macros but for those that actually
define it, making sure the page entry is marked as present and
accessed are the two most important checks.
There are many parts of the VM which are littered with page table walk code and
it is important to recognise it. A very simple example of a page table walk is
the function follow_page() in mm/memory.c. The following
is an excerpt from that function, the parts unrelated to the page table walk
are omitted:
407 pgd_t *pgd;
408 pmd_t *pmd;
409 pte_t *ptep, pte;
410
411 pgd = pgd_offset(mm, address);
412 if (pgd_none(*pgd) || pgd_bad(*pgd))
413 goto out;
414
415 pmd = pmd_offset(pgd, address);
416 if (pmd_none(*pmd) || pmd_bad(*pmd))
417 goto out;
418
419 ptep = pte_offset(pmd, address);
420 if (!ptep)
421 goto out;
422
423 pte = *ptep;
It simply uses the three offset macros to navigate the page tables and the
_none() and _bad() macros to make sure it is looking at
a valid page table.
The third set of macros examine and set the permissions of an entry.
The permissions determine what a userspace process can and cannot do with
a particular page. For example, the kernel page table entries are never
readable by a userspace process.
- The read permissions for an entry are tested with
pte_read(), set with pte_mkread() and
cleared with pte_rdprotect();
- The write permissions are tested with pte_write(),
set with pte_mkwrite() and cleared with
pte_wrprotect();
- The execute permissions are tested with pte_exec(),
set with pte_mkexec() and cleared with
pte_exprotect(). It is worth nothing that with the x86
architecture, there is no means of setting execute permissions on
pages so these three macros act the same way as the read macros;
- The permissions can be modified to a new value with
pte_modify() but its use is almost non-existent. It
is only used in the function change_pte_range() in
mm/mprotect.c.
The fourth set of macros examine and set the state of an entry. There
are only two bits that are important in Linux, the dirty bit and the
accessed bit. To check these bits, the macros pte_dirty()
and pte_young() macros are used. To set the bits, the macros
pte_mkdirty() and pte_mkyoung() are used. To
clear them, the macros pte_mkclean() and pte_old()
are available.
Next: 4.4 Translating and Setting
Up: 4. Page Table Management
Previous: 4.2 Describing a Page
  Contents
  Index
Mel
2004-02-15