Section 680: Data structures for math mode

When $T E X$ reads a formula that is enclosed between $’s, it constructs an mlist, which is essentially a tree structure representing that formula. An mlist is a linear sequence of items, but we can regard it as a tree structure because mlists can appear within mlists. For example, many of the entries can be subscripted or superscripted, and such “scripts” are mlists in their own right.

An entire formula is parsed into such a tree before any of the actual typesetting is done, because the current style of type is usually not known until the formula has been fully scanned. For example, when the formula ‘ $a + b \over c + d$ ’ is being read, there is no way to tell that ‘a + b’ will be in script size until ‘\over’ has appeared.

During the scanning process, each element of the mlist being built is classified as a relation, a binary operator, an open parenthesis, etc., or as a construct like ‘\sqrt’ that must be built up. This classification appears in the mlist data structure.

After a formula has been fully scanned, the mlist is converted to an hlist so that it can be incorporated into the surrounding text. This conversion is controlled by a recursive procedure that decides all of the appropriate styles by a “top-down” process starting at the outermost level and working in towards the subformulas. The formula is ultimately pasted together using combinations of horizontal and vertical boxes, with glue and penalty nodes inserted as necessary.

An mlist is represented internally as a linked list consisting chiefly of “noads” (pronounced “no-adds”), to distinguish them from the somewhat similar “nodes” in hlists and vlists. Certain kinds of ordinary nodes are allowed to appear in mlists together with the noads; $T E X$ tells the difference by means of the type field, since a noad’s type is always greater than that of a node. An mlist does not contain character nodes, hlist nodes, vlist nodes, math nodes, ligature nodes, or unset nodes; in particular, each mlist item appears in the variable-size part of mem, so the type field is always present.

Section 681

Each noad is four or more words long. The first word contains the type and subtype and link fields that are already so familiar to us; the second, third, and fourth words are called the noad’s nucleus, subscr, and supscr fields.

Consider, for example, the simple formula ‘ $x^2$ ’, which would be parsed into an mlist containing a single element called an ORD_NOAD. The nucleus of this noad is a representation of ‘x’, the subscr is empty, and the supscr is a representation of ‘2’.

The nucleus, subscr, and supscr fields are further broken into subfields. If p points to a noad, and if q is one of its principal fields (e.g., q = subscr(p)), there are several possibilities for the subfields, depending on the math_type of q.

math_type(q) = MATH_CHAR means that fam(q) refers to one of the sixteen font families, and character(q) is the number of a character within a font of that family, as in a character node.
math_type(q) = MATH_TEXT_CHAR is similar, but the character is unsubscripted and unsuperscripted and it is followed immediately by another character from the same font. (This math_type setting appears only briefly during the processing; it is used to suppress unwanted italic corrections.)
math_type(q) = EMPTY indicates a field with no value (the corresponding attribute of noad p is not present).
math_type(q) = SUB_BOX means that info(q) points to a box node (either an HLIST_NODE or a VLIST_NODE) that should be used as the value of the field. The shift_amount in the subsidiary box node is the amount by which that box will be shifted downward.
math_type(q) = SUB_MLIST means that info(q) points to an mlist; the mlist must be converted to an hlist in order to obtain the value of this field.

In the latter case, we might have info(q) = null. This is not the same as math_type(q) = EMPTY; for example, ‘ $P_{}$ ’ and ‘ $P$ ’ produce different results (the former will not have the “italic correction” added to the width of P, but the “script skip” will be added).

The definitions of subfields given here are evidently wasteful of space, since a halfword is being used for the math_type although only three bits would be needed. However, there are hardly ever many noads present at once, since they are soon converted to nodes that take up even more space, so we can afford to represent them in whatever way simplifies the programming.

constants.h

#define NOAD_SIZE      4 // number of words in a normal noad
#define MATH_CHAR      1 // |math_type| when the attribute is simple
#define SUB_BOX        2 // |math_type| when the attribute is a box
#define SUB_MLIST      3 // |math_type| when the attribute is a formula
#define MATH_TEXT_CHAR 4 // |math_type| when italic correction is dubious

texmath.h

// << Start file |texmath.h|, 1381 >>

#define nucleus(X) ((X) + 1) // the |nucleus| field of a noad
#define supscr(X)  ((X) + 2) // the |supscr| field of a noad
#define subscr(X)  ((X) + 3) // the |subscr| field of a noad
#define math_type  link      // a |halfword| in |mem|
#define fam        font      // a |quarterword| in |mem|

Section 682

Each portion of a formula is classified as Ord, Op, Bin, Rel, Open, Close, Punct, or Inner, for purposes of spacing and line breaking. An ORD_NOAD, OP_NOAD, BIN_NOAD, REL_NOAD, OPEN_NOAD, CLOSE_NOAD, PUNCT_NOAD, or INNER_NOAD is used to represent portions of the various types. For example, an ‘=’ sign in a formula leads to the creation of a REL_NOAD whose nucleus field is a representation of an equals sign (usually fam = 0, character = 61). A formula preceded by \mathrel also results in a REL_NOAD. When a REL_NOAD is followed by an OP_NOAD, say, and possibly separated by one or more ordinary nodes (not noads), $T E X$ will insert a penalty node (with the current rel_penalty) just after the formula that corresponds to the REL_NOAD, unless there already was a penalty immediately following; and a “thick space” will be inserted just before the formula that corresponds to the OP_NOAD.

A noad of type ORD_NOAD, OP_NOAD, $\dots$ , INNER_NOAD usually has a subtype = NORMAL. The only exception is that an OP_NOAD might have subtype = LIMITS or NO_LIMITS, if the normal positioning of limits has been overridden for this operator.

constants.h

#define ORD_NOAD   (UNSET_NODE + 3) // |type| of a noad classified Ord
#define OP_NOAD    (ORD_NOAD + 1)   // |type| of a noad classified Op
#define BIN_NOAD   (ORD_NOAD + 2)   // |type| of a noad classified Bin
#define REL_NOAD   (ORD_NOAD + 3)   // |type| of a noad classified Rel
#define OPEN_NOAD  (ORD_NOAD + 4)   // |type| of a noad classified Open
#define CLOSE_NOAD (ORD_NOAD + 5)   // |type| of a noad classified Close
#define PUNCT_NOAD (ORD_NOAD + 6)   // |type| of a noad classified Punct
#define INNER_NOAD (ORD_NOAD + 7)   // |type| of a noad classified Inner
#define LIMITS     1                // |subtype| of |OP_NOAD| whose scripts are to be above, below
#define NO_LIMITS  2                // |subtype| of |OP_NOAD| whose scripts are to be normal

Section 683

A RADICAL_NOAD is five words long; the fifth word is the left_delimiter field, which usually represents a square root sign.

A FRACTION_NOAD is six words long; it has a right_delimiter field as well as a left_delimiter.

Delimiter fields are of type four_quarters, and they have four subfields called small_fam, small_char, large_fam, large_char. These subfields represent variable-size delimiters by giving the “small” and “large” starting characters, as explained in Chapter 17 of The TeXbook.

A FRACTION_NOAD is actually quite different from all other noads. Not only does it have six words, it has thickness, denominator, and numerator fields instead of nucleus, subscr, and supscr. The thickness is a scaled value that tells how thick to make a fraction rule; however, the special value DEFAULT_CODE is used to stand for the default_rule_thickness of the current size. The numerator and denominator point to mlists that define a fraction; we always have

math_type(numerator) = math_type(denominator) = SUB_MLIST.

The left_delimiter and right_delimiter fields specify delimiters that will be placed at the left and right of the fraction. In this way, a FRACTION_NOAD is able to represent all of $T E X$ ’s operators \over, \atop, \above, \overwithdelims, \atopwithdelims, and \abovewithdelims.

constants.h

#define RADICAL_NOAD       (INNER_NOAD + 1)   // |type| of a noad for square roots
#define RADICAL_NOAD_SIZE  5                  // number of |mem| words in a radical noad
#define FRACTION_NOAD      (RADICAL_NOAD + 1) // |type| of a noad for generalized fractions
#define FRACTION_NOAD_SIZE 6                  // number of |mem| words in a fraction noad
#define DEFAULT_CODE       0x40000000         // denotes |default_rule_thickness|

texmath.h

#define left_delimiter(X)  ((X) + 4)         // first delimiter field of a noad
#define right_delimiter(X) ((X) + 5)         // second delimiter field of a fraction noad
#define small_fam(X)       qqqq_b0(mem[(X)]) // |fam| for "small" delimiter
#define small_char(X)      qqqq_b1(mem[(X)]) // |character| for "small" delimiter
#define large_fam(X)       qqqq_b2(mem[(X)]) // |fam| for "large" delimiter
#define large_char(X)      qqqq_b3(mem[(X)]) // |character| for "large" delimiter
#define thickness          width             // |thickness| field in a fraction noad
#define numerator          supscr            // |numerator| field in a fraction noad
#define denominator        subscr            // |denominator| field in a fraction noad

Section 684

The global variable empty_field is set up for initialization of empty fields in new noads. Similarly, null_delimiter is for the initialization of delimiter fields.

⟨ Global variables 13 ⟩+≡

memory_word empty_field;
memory_word null_delimiter;

Section 685

⟨ Set initial values of key variables 21 ⟩+≡

hh_rh(empty_field) = EMPTY;
hh_lh(empty_field) = null;
qqqq_b0(null_delimiter) = 0;
qqqq_b1(null_delimiter) = MIN_QUARTERWORD;
qqqq_b2(null_delimiter) = 0;
qqqq_b3(null_delimiter) = MIN_QUARTERWORD;

Section 686

The new_noad function creates an ORD_NOAD that is completely null.

math_structures.c

// << Start file |math_structures.c|, 1382 >>

pointer new_noad() {
    pointer p;
    p = get_node(NOAD_SIZE);
    type(p) = ORD_NOAD;
    subtype(p) = NORMAL;
    mem[nucleus(p)] = empty_field;
    mem[subscr(p)] = empty_field;
    mem[supscr(p)] = empty_field;
    return p;
}

Section 687

A few more kinds of noads will complete the set: An UNDER_NOAD has its nucleus underlined; an OVER_NOAD has it overlined. An ACCENT_NOAD places an accent over its nucleus; the accent character appears as fam(accent_chr(p)) and character(accent_chr(p)). A VCENTER_NOAD centers its nucleus vertically with respect to the axis of the formula; in such noads we always have math_type(nucleus(p)) = sub_box.

And finally, we have LEFT_NOAD and RIGHT_NOAD types, to implement $T E X$ ’s \left and \right. The nucleus of such noads is replaced by a delimiter field; thus, for example, ‘\left(’ produces a LEFT_NOAD such that delimiter(p) holds the family and character codes for all left parentheses. A LEFT_NOAD never appears in an mlist except as the first element, and a RIGHT_NOAD never appears in an mlist except as the last element; furthermore, we either have both a LEFT_NOAD and a RIGHT_NOAD, or neither one is present. The subscr and supscr fields are always empty in a LEFT_NOAD and a RIGHT_NOAD.

constants.h

#define UNDER_NOAD       (FRACTION_NOAD + 1) // |type| of a noad for underlining
#define OVER_NOAD        (UNDER_NOAD + 1)    // |type| of a noad for overlining
#define ACCENT_NOAD      (OVER_NOAD + 1)     // |type| of a noad for accented subformulas
#define ACCENT_NOAD_SIZE 5                   // number of |mem| words in an accent noad
#define VCENTER_NOAD     (ACCENT_NOAD + 1)   // |type| of a noad for \vcenter
#define LEFT_NOAD        (VCENTER_NOAD + 1)  // |type| of a noad for \left
#define RIGHT_NOAD       (LEFT_NOAD + 1)     // |type| of a noad for \right

texmath.h

#define accent_chr(X)      ((X) + 4) // the |accent_chr| field of an accent noad
#define delimiter          nucleus   // |delimiter| field in left and right noads
#define scripts_allowed(X) (type((X)) >= ORD_NOAD && type((X)) < LEFT_NOAD)

Section 688

Math formulas can also contain instructions like \textstyle that override $T E X$ ’s normal style rules. A STYLE_NODE is inserted into the data structure to record such instructions; it is three words long, so it is considered a node instead of a noad. The subtype is either DISPLAY_STYLE or TEXT_STYLE or SCRIPT_STYLE or SCRIPT_SCRIPT_STYLE. The second and third words of a STYLE_NODE are not used, but they are present because a CHOICE_NODE is converted to a STYLE_NODE.

$T E X$ uses even numbers 0, 2, 4, 6 to encode the basic styles DISPLAY_STYLE, $\dots$ , SCRIPT_SCRIPT_STYLE, and adds 1 to get the “cramped” versions of these styles. This gives a numerical order that is backwards from the convention of Appendix G in The TeXbook; i.e., a smaller style has a larger numerical value.

constants.h

#define STYLE_NODE          (UNSET_NODE + 1) // |type| of a style node
#define STYLE_NODE_SIZE     3                // number of words in a style node
#define DISPLAY_STYLE       0                // |subtype| for \displaystyle
#define TEXT_STYLE          2                // |subtype| for \textstyle
#define SCRIPT_STYLE        4                // |subtype| for \scriptstyle
#define SCRIPT_SCRIPT_STYLE 6                // |subtype| for \scriptscriptstyle
#define CRAMPED             1                // add this to an uncramped style if you want to cramp it

math_structures.c

// create a style node
pointer new_style(small_number s) {
    pointer p; // the new node
    p = get_node(STYLE_NODE_SIZE);
    type(p) = STYLE_NODE;
    subtype(p) = s;
    width(p) = 0;
    depth(p) = 0; // the |width| and |depth| are not used
    return p;
}

Section 689

Finally, the \mathchoice primitive creates a CHOICE_NODE, which has special subfields display_mlist, text_mlist, script_mlist, and script_script_mlist pointing to the mlists for each style.

constants.h

#define CHOICE_NODE (UNSET_NODE + 2) // |type| of a choice node

texmath.h

#define display_mlist(X)       info((X) + 1) // mlist to be used in display style
#define text_mlist(X)          link((X) + 1) // mlist to be used in text style
#define script_mlist(X)        info((X) + 2) // mlist to be used in script style
#define script_script_mlist(X) link((X) + 2) // mlist to be used in scriptscript style

math_structures.c

// create a choice node
pointer new_choice() {
    pointer p; // the new node
    p = get_node(STYLE_NODE_SIZE);
    type(p) = CHOICE_NODE;
    subtype(p) = 0; // the |subtype| is not used
    display_mlist(p) = null;
    text_mlist(p) = null;
    script_mlist(p) = null;
    script_script_mlist(p) = null;
    return p;
}

Section 690

Let’s consider now the previously unwritten part of show_node_list that displays the things that can only be present in mlists; this program illustrates how to access the data structures just defined.

In the context of the following program, p points to a node or noad that should be displayed, and the current string contains the “recursion history” that leads to this point. The recursion history consists of a dot for each outer level in which p is subsidiary to some node, or in which p is subsidiary to the nucleus field of some noad; the dot is replaced by ‘_’ or ‘^’ or ‘/’ or ‘\’ if p is descended from the subscr or supscr or denominator or numerator fields of noads. For example, the current string would be ‘.^._/’ if p points to the ORD_NOAD for x in the (ridiculous) formula ‘ $\sqrt{a^{\mathinner{b_{c\over x+y}}}}$ ’.

⟨ Cases of show_node_list that arise in mlists only 690 ⟩≡

case STYLE_NODE:
    print_style(subtype(p));
    break;

case CHOICE_NODE:
    // << Display choice node |p|, 695 >>
    break;

case ORD_NOAD:
case OP_NOAD:
case BIN_NOAD:
case REL_NOAD:
case OPEN_NOAD:
case CLOSE_NOAD:
case PUNCT_NOAD:
case INNER_NOAD:
case RADICAL_NOAD:
case OVER_NOAD:
case UNDER_NOAD:
case VCENTER_NOAD:
case ACCENT_NOAD:
case LEFT_NOAD:
case RIGHT_NOAD:
    // << Display normal noad |p|, 696 >>
    break;

case FRACTION_NOAD:
    // << Display fraction noad |p|, 697 >>
    break;

Section 691

Here are some simple routines used in the display of noads.

display_math.c

// << Start file |display_math.c|, 1382 >>

// prints family and character
void print_fam_and_char(pointer p) {
    print_esc("fam");
    print_int(fam(p));
    print_char(' ');
    print_strnumber(character(p));
}

// prints a delimiter as 24-bit hex value
void print_delimiter(pointer p) {
    int a; // accumulator
    a = small_fam(p)*256 + small_char(p);
    a = a*0x1000 + large_fam(p)*256 + large_char(p);
    if (a < 0) {
        print_int(a); // this should never happen
    }
    else {
        print_hex(a);
    }
}

Section 692

The next subroutine will descend to another level of recursion when a subsidiary mlist needs to be displayed. The parameter c indicates what character is to become part of the recursion history. An empty mlist is distinguished from a field with math_type(p) = EMPTY, because these are not equivalent (as explained above).

display_math.c

// display a noad field
void print_subsidiary_data(pointer p, ASCII_code c) {
    if (cur_length >= depth_threshold) {
        if (math_type(p) != EMPTY) {
            print(" []");
        }
    }
    else {
        append_char(c); // include |c| in the recursion history
        temp_ptr = p; // prepare for |show_info| if recursion is needed
        switch (math_type(p)) {
        case MATH_CHAR:
            print_ln();
            print_current_string();
            print_fam_and_char(p); 
            break;
        
        case SUB_BOX: show_info();
            break; // recursive call
        
        case SUB_MLIST:
            if (info(p) == null) {
                print_ln();
                print_current_string();
                print("{}");
            }
            else {
                show_info();
            }
            break; // recursive call
        
        default:
            do_nothing; //|EMPTY|
        }
        flush_char; // remove |c| from the recursion history
    }
}

Section 693

The inelegant introduction of show_info in the code above seems better than the alternative of using Pascal’s strange forward declaration for a procedure with parameters. The Pascal convention about dropping parameters from a post-forward procedure is, frankly, so intolerable to the author of $T E X$ that he would rather stoop to communication via a global temporary variable. (A similar stoopidity occurred with respect to hlist_out and vlist_out above, and it will occur with respect to mlist_to_hlist below.)

display_math.c

// the reader will kindly forgive this
void show_info() {
    show_node_list(info(temp_ptr));
}

Section 694

display_math.c

void print_style(int c) {
    switch (c / 2) {
    case 0:
        print_esc("displaystyle");
        break; // |DISPLAY_STYLE = 0|
    
    case 1:
        print_esc("textstyle");
        break; // |TEXT_STYLE = 2|
    
    case 2:
        print_esc("scriptstyle");
        break; // |SCRIPT_STYLE = 4|
    
    case 3:
        print_esc("scriptscriptstyle");
        break; // |SCRIPT_SCRIPT_STYLE = 6|
    
    default:
        print("Unknown style!");
    }
}

Section 695

⟨ Display choice node p 695 ⟩≡

print_esc("mathchoice");
append_char('D');
show_node_list(display_mlist(p));
flush_char;
append_char('T');
show_node_list(text_mlist(p));
flush_char;
append_char('S');
show_node_list(script_mlist(p));
flush_char;
append_char('s');
show_node_list(script_script_mlist(p));
flush_char;

Section 696

⟨ Display normal noad p 696 ⟩≡

switch (type(p)) {
case ORD_NOAD:
    print_esc("mathord");
    break;

case OP_NOAD:
    print_esc("mathop");
    break;

case BIN_NOAD:
    print_esc("mathbin");
    break;

case REL_NOAD:
    print_esc("mathrel");
    break;

case OPEN_NOAD:
    print_esc("mathopen");
    break;

case CLOSE_NOAD:
    print_esc("mathclose");
    break;

case PUNCT_NOAD:
    print_esc("mathpunct");
    break;

case INNER_NOAD:
    print_esc("mathinner");
    break;

case OVER_NOAD:
    print_esc("overline");
    break;

case UNDER_NOAD:
    print_esc("underline");
    break;

case VCENTER_NOAD:
    print_esc("vcenter");
    break;

case RADICAL_NOAD:
    print_esc("radical");
    print_delimiter(left_delimiter(p));
    break;

case ACCENT_NOAD:
    print_esc("accent");
    print_fam_and_char(accent_chr(p));
    break;

case LEFT_NOAD:
    print_esc("left");
    print_delimiter(delimiter(p));
    break;

case RIGHT_NOAD:
    print_esc("right");
    print_delimiter(delimiter(p));
}
if (subtype(p) != NORMAL) {
    if (subtype(p) == LIMITS) {
        print_esc("limits");
    }
    else {
        print_esc("nolimits");
    }
}
if (type(p) < LEFT_NOAD) {
    print_subsidiary_data(nucleus(p), '.');
}
print_subsidiary_data(supscr(p), '^');
print_subsidiary_data(subscr(p), '_');

Section 697

⟨ Display fraction noad p 697 ⟩≡

print_esc("fraction, thickness ");
if (thickness(p) == DEFAULT_CODE) {
    print("= default");
}
else {
    print_scaled(thickness(p));
}
if (small_fam(left_delimiter(p)) != 0
    || small_char(left_delimiter(p)) != MIN_QUARTERWORD
    || large_fam(left_delimiter(p)) != 0
    || large_char(left_delimiter(p)) != MIN_QUARTERWORD)
{
    print(", left-delimiter ");
    print_delimiter(left_delimiter(p));
}
if (small_fam(right_delimiter(p)) != 0
    || small_char(right_delimiter(p)) != MIN_QUARTERWORD
    || large_fam(right_delimiter(p)) != 0
    || large_char(right_delimiter(p)) != MIN_QUARTERWORD)
{
    print(", right-delimiter ");
    print_delimiter(right_delimiter(p));
}
print_subsidiary_data(numerator(p), '\\');
print_subsidiary_data(denominator(p), '/');

Section 698

That which can be displayed can also be destroyed.

⟨ Cases of flush_node_list that arise in mlists only 698 ⟩≡

case STYLE_NODE:
    free_node(p, STYLE_NODE_SIZE);
    goto done;

case CHOICE_NODE:
    flush_node_list(display_mlist(p));
    flush_node_list(text_mlist(p));
    flush_node_list(script_mlist(p));
    flush_node_list(script_script_mlist(p));
    free_node(p, STYLE_NODE_SIZE);
    goto done;

case ORD_NOAD:
case OP_NOAD:
case BIN_NOAD:
case REL_NOAD:
case OPEN_NOAD:
case CLOSE_NOAD:
case PUNCT_NOAD:
case INNER_NOAD:
case RADICAL_NOAD:
case OVER_NOAD:
case UNDER_NOAD:
case VCENTER_NOAD:
case ACCENT_NOAD:
    if (math_type(nucleus(p)) >= SUB_BOX) {
        flush_node_list(info(nucleus(p)));
    }
    if (math_type(supscr(p)) >= SUB_BOX) {
        flush_node_list(info(supscr(p)));
    }
    if (math_type(subscr(p)) >= SUB_BOX) {
        flush_node_list(info(subscr(p)));
    }
    if (type(p) == RADICAL_NOAD) {
        free_node(p, RADICAL_NOAD_SIZE);
    }
    else if (type(p) == ACCENT_NOAD) {
        free_node(p, ACCENT_NOAD_SIZE);
    }
    else {
        free_node(p, NOAD_SIZE);
    }
    goto done;

case LEFT_NOAD:
case RIGHT_NOAD:
    free_node(p, NOAD_SIZE);
    goto done;

case FRACTION_NOAD:
    flush_node_list(info(numerator(p)));
    flush_node_list(info(denominator(p)));
    free_node(p, FRACTION_NOAD_SIZE);
    goto done;

TeX in C