mirror of
https://github.com/ultimatepp/ultimatepp.git
synced 2026-05-16 14:16:09 -06:00
135 lines
No EOL
7.4 KiB
C++
135 lines
No EOL
7.4 KiB
C++
topic "Heap implementation";
|
|
[i448;a25;kKO9;*@(64)2 $$1,0#37138531426314131252341829483380:class]
|
|
[l288;2 $$2,2#27521748481378242620020725143825:desc]
|
|
[a83;*R6 $$3,0#31310162474203024125188417583966:caption]
|
|
[l288;i1121;b17;O9;~~~.1408;2 $$4,0#10431211400427159095818037425705:param]
|
|
[i448;a25;kKO9;*@(64)2 $$5,0#37138531426314131252341829483370:item]
|
|
[b108;*+117 $$6,6#14700283458701402223321329925657:header]
|
|
[l416;2 $$7,7#55548704457842300043401641954952:nested`-desc]
|
|
[l288;i448;a25;kO9;*2 $$8,8#64691275497409617375831514634295:nested`-class]
|
|
[b33;2 $$0,0#00000000000000000000000000000000:Default]
|
|
[{_}%EN-US
|
|
[s3; Heap implementation&]
|
|
[s0; U`+`+ heap is divided into 4 categories based on the block size
|
|
`- small, medium, huge and system.&]
|
|
[s0; &]
|
|
[s6; Small blocks&]
|
|
[s0; Blocks <`= 992 bytes. According to our research, blocks <`=
|
|
992 usually represent the majority of blocks used in typical
|
|
C`+`+/U`+`+ applications (>98% of all blocks).&]
|
|
[s0; Small blocks are allocated in 4KB pages that are 4KB aligned.&]
|
|
[s0; There are 18 possible block sizes for small blocks (32, 64,
|
|
96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 448, 576, 672,
|
|
800, 992). Sizes of larger blocks are designed so that they are
|
|
32 bytes rounded and will use the most memory in (4096 `- 32
|
|
`= 4064) page (see bellow). E.g. 4064 / 7 `= 580, which is adjusted
|
|
down to 576, thus wasting just 4064 `- 576 `* 7 `= 32 bytes
|
|
per 4KB page.&]
|
|
[s0; Each 4KB pages is dedicated to a single block size. Therefore
|
|
there is no need to store any per`-block information; instead
|
|
information about the whole block is stored in the 32 bytes header
|
|
at the beginning of 4KB page. This header stores pointer to the
|
|
list of free blocks in the page, double`-link pointers for the
|
|
block so that it can be stored in allocator structures, total
|
|
number of blocks in the 4KB page and a number of free blocks
|
|
in 4KB page.&]
|
|
[s0; Given that blocks do no have individual headers, the critical
|
|
implementation details is how FreeMemory routine detects the
|
|
small blocks. This is solved by putting this information directly
|
|
into the pointer to heap: Small blocks always have bit 5 of address
|
|
one, while other block categories have it zero. In other words,
|
|
small blocks are 32 bytes misaligned while other categories are
|
|
32 bytes aligned.&]
|
|
[s0; Once small block detected in MemoryFree, the necessary booking
|
|
information is found at the start of 4KB blocks.&]
|
|
[s0; Allocator keeps the track of 4KB pages that are completely used
|
|
(no free blocks), partially used or empty. Empty pages can be
|
|
eventually converted to different block size.&]
|
|
[s0; Allocator also uses cache of small blocks as additional optimization.
|
|
In this cache, up to about 3.5KB of small blocks per small block
|
|
size are stored on free, without really invoking more complex
|
|
deallocation routine.&]
|
|
[s6; Medium blocks &]
|
|
[s0; Blocks >256 and < 65504 bytes. Approximate best`-fit allocator
|
|
is used for these blocks. Memory is organized in 64KB pages.
|
|
Each allocated block has header with its size and the size of
|
|
previous block, free flag and pointer to the Heap.&]
|
|
[s0; Allocator keeps an array of lists of free blocks of particular
|
|
sizes. Size distribution is mostly exponential, blocks lower
|
|
than 2048 are rounded up to 32 bytes, between 2048 and about
|
|
35000 rounding exponentially grows up to 2048 and then stays
|
|
at this value. Each such size has its index in the array of free
|
|
blocks.&]
|
|
[s0; When allocating, index is decided based on the size and array
|
|
is searched starting with that index to obtain the smallest free
|
|
block (best`-fit) greater than required size. Bigger blocks are
|
|
divided and the rest of block is put to free block list.&]
|
|
[s0; When freeing, allocator merges the freed block with previous
|
|
or next free block if any and reassigns in free block list.&]
|
|
[s0; Note that master header of 64KB blocks and all operations are
|
|
designed so that resulting pointers are NOT 32 bytes aligned
|
|
(see description of small blocks).&]
|
|
[s6; Huge blocks&]
|
|
[s0; There is shared (between threads) subheap for blocks less than
|
|
16MB with allocation unit 4KB. Blocks bigger than 65504 bytes
|
|
and less than 16MB are directly allocated from this. The allocation
|
|
unit is 4KB. Small and medium pages are also allocated from this
|
|
subheap.&]
|
|
[s0; This category solves two problems: Allocating and freeing system
|
|
blocks is surprisingly expensive operation, so this subheap optimizes
|
|
this situation. It also allows for converting memory between
|
|
small 4KB and medium 64KB pages.&]
|
|
[s6; System blocks&]
|
|
[s0; Blocks larger than 16MB are allocated directly from the system.&]
|
|
[s0; &]
|
|
[s6; Multithreading&]
|
|
[s0; Each thread has its own small and medium blocks heap (implemented
|
|
using TLS) and there is also `'aux`' heap, which is basically
|
|
used to keep track of completely free 4KB pages or 64KB chunks.&]
|
|
[s0; Most small and medium block allocations are lockless. Single
|
|
mutex for the whole allocator is locked in following, relatively
|
|
rare, situations:&]
|
|
[s0;i150;O0; When freeing the block that was allocated in different
|
|
thread (has different heap). Such blocks are first buffered until
|
|
their total size is more than 2000 bytes, then the mutex is locked
|
|
and all blocks are distributed to remote`_free lists of respective
|
|
heaps.&]
|
|
[s0;i150;O0; When allocating the small block and there is no block
|
|
available in partially used pages and there is no `'reserve`'
|
|
empty page (of any size class) available `- in that case, mutex
|
|
is locked and and situation resolved, either by obtaining the
|
|
free page from global storage, by adopting orphaned partially
|
|
used page from auxiliary page, or if all else fails, by retrieving
|
|
page from the system. As part of this process, remote`_list pointer
|
|
is checked (without locking) and if not null, mutex is locked
|
|
and remotely freed pages are processed.&]
|
|
[s0;i150;O0; When freeing the small block which results in completely
|
|
free page and when heap already has reserve empty page for given
|
|
size class. In that case, reserve page is put to global list
|
|
of empty pages and new free page is used as new reserve (this
|
|
is because new page is likely more `'hot`' in cache).&]
|
|
[s0;i150;O0; When freeing the large block that was allocated in different
|
|
thread. In that case, mutex is locked and the block is put to
|
|
respective thread`'s heap large`_remote`_free.&]
|
|
[s0;i150;O0; When allocating the large block and there is no block
|
|
available. In that case, mutex is locked and large`_remote`_free
|
|
blocks are properly freed, then the allocation is retried.&]
|
|
[s0;i150;O0; When allocating from huge or system heap, mutex is always
|
|
locked.&]
|
|
[s6; Specific features&]
|
|
[s0; Beyond standard free/malloc like trivial interface, U`+`+ allocator
|
|
offers some specific features:&]
|
|
[s0; MemoryAllocSz changes the size parameter to actually reflect
|
|
the actual free space allocated.&]
|
|
[s0; MemoryAlloc32 and MemoryFree32 are version optimized to allocate
|
|
exactly 32 bytes, which is an important size for U`+`+ String
|
|
type `- by knowing the exact size, allocator can skip several
|
|
branches.&]
|
|
[s0; While the minimal block size returned is normally 32 bytes,
|
|
U`+`+ allocator can effectively allocate even smaller blocks
|
|
when TinyAlloc / TinyFree interface is used `- the price to pay
|
|
is that TinyFree needs to pass the size of block that was requested
|
|
by TinyAlloc as the argument. Note that these blocks are still
|
|
small, which means there can be 508 8bytes long blocks in 4KB
|
|
page.&]
|
|
[s0; ]] |