This section begins with how pages are added to the page cache. It will then cover how pages are moved from the active_list to the inactive_list. Lastly we will cover how pages are reclaimed from the page cache.
Pages which are read from a file or block device are added to the page cache by calling __add_to_page_cache() during generic_file_read().
All filesystems use the high level function generic_file_read() so that operations will take place through the page cache. It calls do_generic_file_read() which first checks if the page exists in the page cache. If it does not, the information is read from disk and added to the cache with __add_to_page_cache().
Anonymous pages are added to the page cache the first time they are about to be swapped out and will be discussed further in Section 12.4. The only real difference between anonymous pages and file backed pages as far as the page cache is concerned is that anonymous pages will use swapper_space as the struct address_space.
Shared memory pages are added during one of two cases. The first is during shmem_getpage_locked() which is called when a page has to be either fetched from swap or allocated as it is the first reference. The second is when the swapout code calls shmem_unuse(). This occurs when a swap area is being deactivated and a page, backed by swap space, is found that does not appear to belong to any process. The inodes related to shared memory are exhaustively searched until the correct page is found. In both cases, the page is added with add_to_page_cache().
When caches are being shrunk, pages are moved from the active_list to the inactive_list by the function refill_inactive(). It takes as a parameter the number of pages to move, which is calculated in shrink_caches() as a ratio depending on nr_pages, the number of pages in active_list and the number of pages in inactive_list. The number of pages to move is calculated as
This keeps the active_list about two thirds the size of the inactive_list and the number of pages to move is determined as a ratio based on how many pages we desire to swap out (nr_pages).
Pages are taken from the end of the active_list. If the PG_referenced flag is set, it is cleared and the page is put back at top of the active_list as it has been recently used and is still ``hot''. If the flag is cleared, it is moved to the inactive_list and the PG_referenced flag set so that it will be quickly promoted to the active_list if necessary.
The function shrink_cache() is the part of the replacement algorithm which takes pages from the inactive_list and decides how they should be swapped out. The two starting parameters which determine how much work will be performed are nr_pages and priority. nr_pages starts out as SWAP_CLUSTER_MAX and priority starts as DEF_PRIORITY.
Two parameters, max_scan and max_mapped determine how much work the function will do and are affected by the priority. Each time the function shrink_caches() is called without enough pages being freed, the priority will be decreased until the highest priority 1 is reached.
max_scan is the maximum number of pages will be scanned by this function and is simply calculated as
where nr_inactive_pages is the number of pages in the inactive_list. This means that at lowest priority 6, at most one sixth of the pages in the inactive_list will be scanned and at highest priority, all of them will be.
The second parameter is max_mapped which determines how many process pages are allowed to exist in the page cache before whole processes will be swapped out. This is calculated as the minimum of either one tenth of max_scan or
In other words, at lowest priority, the maximum number of mapped pages allowed is either one tenth of max_scan or 16 times the number of pages to swap out (nr_pages) whichever is the lower number. At high priority, it is either one tenth of max_scan or 512 times the number of pages to swap out.
From there, the function is basically a very large for-loop which scans at most max_scan pages to free up nr_pages pages from the end of the inactive_list or until the inactive_list is empty. After each page, it checks to see whether it should reschedule itself so that the swapper does not monopolise the CPU.
For each type of page found on the list, it makes a different decision on what to do. The page types and actions are as follows:
Page is mapped by a process. The max_mapped count is decremented. If it reaches 0, the page tables of processes will be linearly searched and swapped out by the function swap_out()
Page is locked and the PG_launder bit is set. A reference to the page is taken with page_cache_get() so that the page will not disappear and wait_on_page() is called which sleeps until the IO is complete. Once it is completed, the reference count is decremented with page_cache_release(). When the count reaches zero, it is freed.
Page is dirty, is unmapped by all processes, has no buffers and belongs to a device or file mapping. The PG_dirty bit is cleared and the PG_launder bit is set. A reference to the page is taken with page_cache_get() so the page will not disappear prematurely and then the writepage() function provided by the mapping is called to clean the page. The last case will pick up this page during the next pass and wait for the IO to complete if necessary.
Page has buffers associated with data on disk. A reference is taken to the page and an attempt is made to free the pages with try_to_release_page(). If it succeeds and is an anonymous page, the page can be freed. If it is backed by a file or device, the reference is simply dropped and the page will be freed later. However it is unclear how a page could have both associated buffers and a file mapping.
Page is anonymous belonging to a process and has no associated buffers. The LRU is unlocked and the page is unlocked. The max_mapped count is decremented. If it reaches zero, then swap_out() is called to start swapping out entire processes as there are too many process mapped pages in the page cache. An anonymous page may have associated buffers if it is backed by a swap file. In this case, the page is treated as a buffer page and normal block IO syncs the page with the backing storage.
Page has no references to it. If the page is in the swap cache, it is deleted from it as it is now stored in the swap area. If it is part of a file, it is removed from the inode queue. The page is then deleted from the page cache and freed.