After this documentation was released in July 2003, I was approached
by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.
The book is available and called simply "Understanding The Linux Virtual
Memory Manager". There is a lot of additional material in the book that is
not available here, including details on later 2.4 kernels, introductions
to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB
management, a lot more code commentary, countless other additions and
clarifications and a CD with lots of cool stuff on it. This material (although
now dated and lacking in comparison to the book) will remain available
although I obviously encourge you to buy the book from your favourite book
store :-) . As the book is under the Bruce Perens Open Book Series, it will
be available 90 days after appearing on the book shelves which means it
is not available right now. When it is available, it will be downloadable
from http://www.phptr.com/perens
so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 12.7 Swapping In Pages
Up: 12. Swap Management
Previous: 12.5 Activating a Swap
  Contents
  Index
In comparison to activating a swap area, deactivation is incredibly expensive.
The principal problem is that the area cannot be simply removed, every page
that is swapped out must be swapped back in again. Just as there is no quick
way of mapping a struct page to every PTE that references it,
there is no quick way to map a swap entry to a PTE either. This requires
that all process page tables be traversed to find PTEs which reference the
swap area to be deactivated and swap them in. This of course means that swap
deactivation will fail if the physical memory is not available.
The function responsible for deactivating an area is, predictably enough,
called sys_swapoff(). This function is mainly concerned with
updating the swap_info_struct. The major task of paging in each
paged-out page is the responsibility of try_to_unuse() which is
extremely expensive. For each slot used in the swap_map,
the page tables for processes have to be traversed searching for it. In the
worst case, all page tables belonging to all mm_structs may
have to be traversed. Therefore, the tasks taken for deactivating an area
are broadly speaking;
- Call user_path_walk() to acquire the information about the
special file to be deactivated and then take the BKL
- Remove the swap_info_struct from the swap list and update
the global statistics on the number of swap pages available
(nr_swap_pages) and the total number of swap entries
(total_swap_pages. Once this is acquired, the BKL can be
released again
- Call try_to_unuse() which will page in all pages from the swap
area to be deactivated. This function loops through the swap map using
find_next_to_unuse() to locate the next used swap slot. For
each used slot it finds, it performs the following;
- Call read_swap_cache_async() to allocate a page
for the slot saved on disk. Ideally it exists in the swap cache
already but the page allocator will be called if it is not
- Wait on the page to be fully paged in and lock it. Once locked,
call unuse_process() for every process that has a
PTE referencing the page. This function traverses the page table
searching for the relevant PTE and then updates it to point to the
struct page. If the page is a shared memory page with
no remaining reference, shmem_unuse() is called instead
- Free all slots that were permanently mapped. It is believed
that slots will never become permanently reserved so the risk
is taken.
- Delete the page from the swap cache to prevent
try_to_swap_out() referencing a page in the event it
still somehow has a reference in swap map
- If there was not enough available memory to page in all the entries, the
swap area is reinserted back into the running system as it cannot be
simply dropped. If it succeeded, the swap_info_struct
is placed into an uninitialised state and the swap_map
memory freed with vfree()
Next: 12.7 Swapping In Pages
Up: 12. Swap Management
Previous: 12.5 Activating a Swap
  Contents
  Index
Mel
2004-02-15