All page sized slots are tracked by the array
swap_info_structswap_map which is of type
unsigned short. Each entry is a reference count of the number
of users of the slot which happens in the case of a shared page and is 0
when free. If the entry is SWAP_MAP_MAX, the page is permanently
reserved for that slot. It is unlikely, if not impossible, for this condition
to occur but it exists to ensure the reference count does not overflow. If
the entry is SWAP_MAP_BAD, the slot is unusable.
The task of finding and allocating a swap entry is divided
into two major tasks. The first performed by the high
level function get_swap_page(). Starting with
swap_listnext, it searches swap areas for a suitable slot.
Once a slot has been found, it records what the next swap area to be used
will be and returns the allocated entry.
The task of searching the map is the responsibility of scan_swap_map(). In principle, it is very simple as it linearly scan the array for a free slot and return. Predictably, the implementation is a bit more thorough.
Linux attempts to organise pages into clusters
on disk of size SWAPFILE_CLUSTER. It allocates
SWAPFILE_CLUSTER number of pages sequentially in swap
keeping count of the number of sequentially allocated pages in
swap_info_structcluster_nr and records the current
offset in swap_info_struct
cluster_next. Once a
sequential block has been allocated, it searches for a block of free entries
of size SWAPFILE_CLUSTER. If a block large enough can be found,
it will be used as another cluster sized sequence.
If no free clusters large enough can be found in
the swap area, a simple first-free search starting from
swap_info_structlowest_bit is performed. The aim is
to have pages swapped out at the same time close together on the premise that
pages swapped out together are related. This premise, which seems strange at
first glance, is quite solid when it is considered that the page replacement
algorithm will use swap space most when linearly scanning the process address
space swapping out pages. Without scanning for large free blocks and using
them, it is likely that the scanning would degenerate to first-free searches
and never improve. With it, processes exiting are likely to free up large
blocks of slots.