Section 115: Dynamic memory allocation

The system does nearly all of its own memory allocation, so that it can readily be transported into environments that do not have automatic facilities for strings, garbage collection, etc., and so that it can be in control of what error messages the user receives. The dynamic storage requirements of are handled by providing a large array mem in which consecutive blocks of words are used as nodes by the routines.

Pointer variables are indices into this array, or into another array called eqtb that will be explained later. A pointer variable might also be a special flag that lies outside the bounds of mem, so we allow pointers to assume any halfword value. The minimum halfword value represents a null pointer. does not assume that mem[null] exists.

NOTE

All constants are written in uppercase in this implementation, however, the NULL keyword is a reserved word in C, so an exception is made for the null pointer.

constants.h
#define null MIN_HALFWORD // the null pointer

⟨ Types in the outer block 18 ⟩+≡

typedef halfword pointer; // a flag or a location in |mem| or |eqtb|

⟨ Global variables 13 ⟩+≡

pointer temp_ptr; // a pointer variable for occasional emergency use

Section 116

The mem array is divided into two regions that are allocated separately, but the dividing line between these two regions is not fixed; they grow together until finding their “natural” size in a particular job. Locations less than or equal to lo_mem_max are used for storing variable-length records consisting of two or more words each. This region is maintained using an algorithm similar to the one described in exercise 2.5–19 of The Art of Computer Programming. However, no size field appears in the allocated nodes; the program is responsible for knowing the relevant size when a node is freed. Locations greater than or equal to hi_mem_min are used for storing one-word records; a conventional AVAIL stack is used for allocation in this region.

Locations of mem between MEM_BOT and MEM_TOP may be dumped as part of preloaded format files, by the INITEX preprocessor. Production versions of may extend the memory at both ends in order to provide more space; locations between MEM_MIN and MEM_BOT are always used for variable-size nodes, and locations between MEM_TOP and MEM_MAX are always used for single-word nodes.

The key pointers that govern mem allocation have a prescribed order:

Empirical tests show that the present implementation of tends to spend about 9% of its running time allocating nodes, and about 6% deallocating them after their use.

NOTE

MEM_MIN is 0, so it is not used in the declaration of mem.

⟨ Global variables 13 ⟩+≡

memory_word mem[MEM_MAX + 1]; // the big dynamic storage area
pointer lo_mem_max;           // the largest location of variable-size memory in use
pointer hi_mem_min;           // the smallest location of one-word memory in use

Section 117

In order to study the memory requirements of particular applications, it is possible to prepare a version of that keeps track of current and maximum memory usage. When code between the delimiters stat .. tats. is not “commented out”, will run a bit slower but it will report these statistics when tracing_stats is sufficiently large.

⟨ Global variables 13 ⟩+≡

int var_used, dyn_used; // how much memory is in use

Section 118

Let’s consider the one-word memory region first, since it’s the simplest. The pointer variable mem_end holds the highest-numbered location of mem that has ever been used. The free locations of mem that occur between hi_mem_min and mem_end, inclusive, are of type two_halves, and we write info(p) and link(p) for the lh and rh fields of mem[p] when it is of this type. The single-word free locations form a linked list

avail, link(avail), link(link(avail)),

terminated by null.

datastructures.h
#define link(X) hh_rh(mem[(X)]) // the |link| field of a memory word
#define info(X) hh_lh(mem[(X)]) // the |info| field of a memory word

⟨ Global variables 13 ⟩+≡

pointer avail;   // head of the list of available one-word nodes
pointer mem_end; // the last one-word node used in |mem|

Section 119

If memory is exhausted, it might mean that the user has forgotten a right brace. We will define some procedures later that try to help pinpoint the trouble.

NOTE

Those are show_token_list and runaway, but have been defined directly in a file.

Section 120

The function get_avail returns a pointer to a new one-word node whose link field is null. However, will halt if there is no more room left.

If the available-space list is empty, i.e., if avail = null, we try first to increase mem_end. If that cannot be done, i.e., if mem_end = MEM_MAX, we try to decrease hi_mem_min. If that cannot be done, i.e., if hi_mem_min = lo_mem_max + 1, we have to quit.

memory.c
// << Start file |memory.c|, 1382 >>

// single-word node allocation
pointer get_avail() {
    pointer p; // the new node being got
    p = avail; // get top location in the |avail| stack
    if (p != null) {
        // and pop-it off
        avail = link(avail);
    }
    else if (mem_end < MEM_MAX) {
        // or go into virgin directory
        incr(mem_end);
        p = mem_end;
    }
    else {
        decr(hi_mem_min);
        p = hi_mem_min;
        if (hi_mem_min <= lo_mem_max) {
            // if memory is exhausted, display possible runaway text
            runaway();
            // quit; all one-word nodes are busy
            overflow("main memory size", MEM_MAX + 1 - MEM_MIN);
        }
    }
    link(p) = null; // provide an oft-desired initialization of the new node
    incr_dyn_used; // maintain statistics
    return p;
}

Section 121

Conversely, a one-word node is recycled by calling free_avail. This routine is part of ’s “inner loop”, so we want it to be fast.

datastructures.h
// single-word node liberation
#ifdef STAT
#define decr_dyn_used decr(dyn_used)
#define incr_dyn_used incr(dyn_used)
#else
#define decr_dyn_used
#define incr_dyn_used
#endif

#define free_avail(X) link((X)) = avail; avail = (X); decr_dyn_used

Section 122

There’s also a fast_get_avail routine, which saves the procedure-call overhead at the expense of extra programming. This routine is used in the places that would otherwise account for the most calls of get_avail.

datastructures.h
#define fast_get_avail(X)                                 \
    do {                                                  \
        /* avoid |get avail| if possible, to save time */ \
        (X) = avail;                                      \
        if ((X) == null) {                                \
            (X) = get_avail();                            \
        }                                                 \
        else {                                            \
            avail = link((X));                            \
            link((X)) = null;                             \
            incr_dyn_used;                                \
        }                                                 \
    } while(0)

Section 123

The procedure flush_list(p) frees an entire linked list of one-word nodes that starts at position p.

memory.c
// makes list of single-word nodes
void flush_list(pointer p) {
    pointer q, r; // list traversers
    if (p != null) {
        r = p;
        do {
            q = r;
            r = link(r);
            decr_dyn_used;
        } while (r != null);
        // now |q| is the last node on the list
        link(q) = avail;
        avail = p;
    }
}

Section 124

The available-space list that keeps track of the variable-size portion of mem is a nonempty, doubly-linked circular list of empty nodes, pointed to by the roving pointer rover.

Each empty node has size 2 or more; the first word contains the special value MAX_HALFWORD in its link field and the size in its info field; the second word contains the two pointers for double linking.

Each nonempty node also has size 2 or more. Its first word is of type two_halves, and its link field is never equal to MAX_HALFWORD. Otherwise there is complete flexibility with respect to the contents of its other fields and its other words.

(We require MEM_MAX MAX_HALFWORD because terrible things can happen when MAX_HALFWORD appears in the link field of a nonempty node.)

constants.h
#define EMPTY_FLAG MAX_HALFWORD // the |link| of an empty variable-size node
datastructures.h
#define is_empty(X) (link((X)) == EMPTY_FLAG) // tests for empty node
#define node_size   info          // the size field in empty variable-size nodes
#define llink(X)    info((X) + 1) // left link in doubly-linked list of empty nodes
#define rlink(X)    link((X) + 1) // right link in doubly-linked list of empty nodes

⟨ Global variables 13 ⟩+≡

pointer rover; // points to some node in the list of empties

Section 125

A call to get_node with argument s returns a pointer to a new node of size s, which must be 2 or more. The link field of the first word of this new node is set to null. An overflow stop occurs if no suitable space exists.

If get_node is called with , it simply merges adjacent free areas and returns the value MAX_HALFWORD.

memory.c
// variable-size node allocation
pointer get_node(int s) {
    pointer p; // the node currently under inspection
    pointer q; // the node physically after node |p|
    int r;     // the newly allocated node, or a candidate for this honor
    int t;     // temporary register

restart:
    p = rover; // start at some free node in the ring
    do {
        // << Try to allocate within node |p| and its physical successors, and |goto found| if allocation was possible, 127 >>
        p = rlink(p); // move to the next node in the ring
    } while (p != rover); // repeat until the whole list has been traversed
    
    if (s == 0x40000000) {
        return MAX_HALFWORD;
    }
    if (lo_mem_max + 2 < hi_mem_min
        && lo_mem_max + 2 <= MEM_BOT + MAX_HALFWORD)
    {
        // << Grow more variable-size memory and |goto restart|, 126 >>
    }
    // sorry, nothing satisfactory is left
    overflow("main memory size", MEM_MAX + 1 - MEM_MIN);

found:
    link(r) = null; // this node is now nonempty
#ifdef STAT
    var_used += s; // maintain usage statistics
#endif
    return r;
}

Section 126

The lower part of mem grows by 1000 words at a time, unless we are very close to going under. When it grows, we simply link a new node into the available-space list. This method of controlled growth helps to keep the mem usage consecutive when is implemented on “virtual memory” systems.

⟨ Grow more variable-size memory and goto restart 126 ⟩≡

if (hi_mem_min - lo_mem_max >= 1998) {
    t = lo_mem_max + 1000;
}
else {
    // |lo_mem_max + 2 <= t < hi_mem_min|
    t = lo_mem_max + 1 + (hi_mem_min - lo_mem_max) / 2;
}
p = llink(rover);
q = lo_mem_max;
rlink(p) = q;
llink(rover) = q;

if (t > MEM_BOT + MAX_HALFWORD) {
    t = MEM_BOT + MAX_HALFWORD;
}
rlink(q) = rover;
llink(q) = p;
link(q) = EMPTY_FLAG;
node_size(q) = t - lo_mem_max;
lo_mem_max = t;
link(lo_mem_max) = null;
info(lo_mem_max) = null;
rover = q;
goto restart;

Section 127

Empirical tests show that the routine in this section performs a node-merging operation about 0.75 times per allocation, on the average, after which it finds that r p + 1 about 95% of the time.

⟨ Try to allocate within node p and its physical successors, and goto found if allocation was possible 127 ⟩≡

q = p + node_size(p); // find the physical successor
while(is_empty(q)) {
    // merge node |p| with node |q|
    t = rlink(q);
    if (q == rover) {
        rover = t;
    }
    llink(t) = llink(q);
    rlink(llink(q)) = t;
    q += node_size(q);
}
r = q - s;
if (r > p + 1) {
    // << Allocate from the top of node |p| and |goto found|, 128 >>
}
if (r == p && rlink(p) != p) {
    // << Allocate entire node |p| and |goto found|, 129 >>
}
node_size(p) = q - p; // reset the size in case it grew

Section 128

⟨ Allocate from the top of node p and goto found 128 ⟩≡

node_size(p) = r - p; // store the remaining size
rover = p; // start searching here next time
goto found;

Section 129

Here we delete node p from the ring, and let rover rove around.

⟨ Allocate entire node p and goto found 129 ⟩≡

rover = rlink(p);
t = llink(p);
llink(rover) = t;
rlink(t) = rover;
goto found;

Section 130

Conversely, when some variable-size node p of size s is no longer needed, the operation free_node(p, s) will make its words available, by inserting p as a new empty node just before where rover now points.

memory.c
// variable-size node liberation
void free_node(pointer p, halfword s) {
    pointer q; // |llink(rover)|
    // set both links
    node_size(p) = s;
    link(p) = EMPTY_FLAG;
    q = llink(rover);
    llink(p) = q;
    rlink(p) = rover;
    // insert |p| into the ring
    llink(rover) = p;
    rlink(q) = p;
#ifdef STAT
    var_used -= s; // maintain statistics
#endif
}

Section 131

Just before INITEX writes out the memory, it sorts the doubly linked available space list. The list is probably very short at such times, so a simple insertion sort is used. The smallest available location will be pointed to by rover, the next-smallest by rlink(rover), etc.

memory.c
#ifdef INIT
// sorts the available variable-size nodes by location
void sort_avail() {
    pointer p, q, r;   // indices into |mem|
    pointer old_rover; // initial |rover| setting
    p = get_node(0x40000000); // merge adjacent free areas
    p = rlink(rover);
    rlink(rover) = MAX_HALFWORD;
    old_rover = rover;
    while (p != old_rover) {
        // << Sort |p| into the list starting at |rover| and advance |p| to |rlink(p)|, 132 >>
    }
    p = rover;
    while (rlink(p) != MAX_HALFWORD) {
        llink(rlink(p)) = p;
        p = rlink(p);
    }
    rlink(p) = rover;
    llink(rover) = p;
}
#endif

Section 132

The following while loop is guaranteed to terminate, since the list that starts at rover ends with MAX_HALFWORD during the sorting procedure.

⟨ Sort p into the list starting at rover and advance p to rlink(p) 132 ⟩≡

if (p < rover) {
    q = p;
    p = rlink(q);
    rlink(q) = rover;
    rover = q;
}
else {
    q = rover;
    while (rlink(q) < p) {
        q = rlink(q);
    }
    r = rlink(p);
    rlink(p) = rlink(q);
    rlink(q) = p;
    p = r;
}