After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.

Next: 12.5 Activating a Swap Up: 12. Swap Management Previous: 12.3 Allocating a swap Contents Index

12.4 Swap Cache

Pages that are shared between many processes can not be easily swapped out because, as mentioned, there is no quick way to map a struct page to every PTE that references it. This leads to the race condition where a page is present for one PTE and swapped out for another gets updated without being synced to disk thereby losing the update.

To address this problem, shared pages that have a reserved slot in backing storage are considered to be part of the swap cache. The swap cache is purely conceptual as there is no simple way to quickly traverse all the pages on it and there is no dedicated list but pages that exist on the page cache that have a slot reserved in backing storage are members of it. This means that anonymous pages, by default, are not part of the swap cache until an attempt is made to swap them out. It also means that by default, pages that belong to a shared memory region are added to the swap cache when they are first written to.

A page is identified as being part of the swap cache once the page $\rightarrow$ mapping field has been set to swapper_space which is the address_space struct managing the swap area. This condition is tested with the PageSwapCache() macro. Linux uses the exact same logic for keeping pages between swap and memory in sync as it uses for keeping pages belonging to files and memory coherent. The principal difference is that instead of using an struct address_space tied to a filesystem, swapper_space is associated which has registered functions for writing to swap space. The second difference is that instead of using pageindex to mark an offset within a file, it is used to store the swp_entry_t structure.

When a page is being added to the swap cache, a slot is allocated with get_swap_page(), added to the page cache with add_to_swap_cache() and then marked dirty. When the page is next laundered, it will actually be written to backing storage on disk as the normal page cache would operate. This process is illustrated in Figure 12.3 and the call graph is shown in Figure 12.4.

**Figure 12.3:** Adding a Page to the Swap Cache
$\includegraphics[width=15cm]{graphs/add_swap_cache_flow.ps}$

**Figure 12.4:** Call Graph: `add_to_swap_cache()`
$\includegraphics[width=17cm]{graphs/add_to_swap_cache.ps}$

Subsequent swapping of the page from shared PTEs results in a call to swap_duplicate() which simply increments the reference to the slot in the swap_map. If the PTE is marked dirty by the hardware as a result of a write, the bit is cleared and the struct page is marked dirty with set_page_dirty() so that the on-disk copy will be synced before the page is dropped. This ensures that until all references to the page have been dropped, a check will be made to ensure the data on disk matches the data in the page frame.

When the reference count to the page finally reaches 0, the page is eligible to be dropped from the page cache and the swap map count will have the count of the number of PTEs the on-disk slot belongs to so that the slot will not be freed prematurely. It is laundered and finally dropped with the same LRU aging and logic described in Chapter 11.

If, on the other hand, a page fault occurs for a page that is ``swapped out'', the logic in do_swap_page() will check to see if the page exists in the swap cache by calling lookup_swap_cache(). If it does, the PTE is updated to point to the page frame, the page reference count incremented and the swap slot decremented with swap_free().

**Table 12.1:** Swap Cache API
$\begin{table} % latex2html id marker 12044 \begin{center} \begin{tabularx}{13.... ...ffectively free \\ \\ \par \hline \end{tabularx} \end{center} \end{table}$

Next: 12.5 Activating a Swap Up: 12. Swap Management Previous: 12.3 Allocating a swap Contents Index

Mel 2004-02-15