Ideally, a VM from which memory has been reclaimed should perform as if it had been configured with
less memory. ESX Server uses a ballooning technique to achieve such predictable performance by coaxing the
guest OS into cooperating with it when possible.
A small balloon module is loaded into the guest OS as a pseudo-device driver or kernel service. It has no external interface within the guest, and communicates with ESX Server via a private channel. When the server
wants to reclaim memory, it instructs the driver to “in- flate” by allocating pinned physical pages within the
VM, using appropriate native interfaces. Similarly, the server may “deflate” the balloon by instructing it to deallocate previously-allocated pages.
Inflating the balloon increases memory pressure in the guest OS, causing it to invoke its own native memory
management algorithms. When memory is plentiful, the guest OS will return memory from its free list. When
memory is scarce, it must reclaim space to satisfy the driver allocation request. The guest OS decides which
particular pages to reclaim and, if necessary, pages them out to its own virtual disk. The balloon driver communicates the physical page number for each allocated page to ESX Server, which may then reclaim the corresponding machine page. Deflating the balloon frees up Guest Memory .
ESX Server controls a balloon module running within the guest, directing it to allocate guest pages
and pin them in “physical” memory. The machine pages backing this memory can then be reclaimed by ESX Server. Inflating the balloon increases memory pressure, forcing the guest OS to invoke its own memory management algorithms. The guest OS may page out to its virtual disk when memory is scarce. Deflating the balloon decreases pressure, freeing guest memory.