Section 402: Basic scanning subroutines

Let’s turn now to some procedures that $T E X$ calls upon frequently to digest certain kinds of patterns in the input. Most of these are quite simple; some are quite elaborate. Almost all of the routines call get_x_token, which can cause them to be invoked recursively.

Section 403

The scan_left_brace routine is called when a left brace is supposed to be the next non-blank token. (The term “left brace” means, more precisely, a character whose catcode is LEFT_BRACE.) $T E X$ allows \relax to appear before the LEFT_BRACE.

subroutines.c

// << Start file |subroutines.c|, 1382 >>

// reads a mandatory |LEFT_BRACE|
void scan_left_brace() {
    // << Get the next non-blank non-relax non-call token, 404 >>
    if (cur_cmd != LEFT_BRACE) {
        print_err("Missing { inserted");
        help4("A left brace was mandatory here, so I've put one in.")
            ("You might want to delete and/or insert some corrections")
            ("so that I will find a matching right brace soon.")
            ("(If you're confused by all this, try typing `I}' now.)");
        back_error();
        cur_tok = LEFT_BRACE_TOKEN + '{';
        cur_cmd = LEFT_BRACE;
        cur_chr = '{';
        incr(align_state);
    }
}

Section 404

⟨ Get the next non-blank non-relax non-call token 404 ⟩≡

do {
    get_x_token();
} while (cur_cmd == SPACER || cur_cmd == RELAX);

Section 405

The scan_optional_equals routine looks for an optional ‘=’ sign preceded by optional spaces; ‘\relax’ is not ignored here.

subroutines.c

void scan_optional_equals() {
    // << Get the next non-blank non-call token, 406 >>
    if (cur_tok != OTHER_TOKEN + '=') {
        back_input();
    }
}

Section 406

⟨ Get the next non-blank non-call token 406 ⟩≡

do {
    get_x_token();
} while (cur_cmd == SPACER);

Section 407

In case you are getting bored, here is a slightly less trivial routine: Given a string of lowercase letters, like ‘pt’ or ‘plus’ or ‘width’, the scan_keyword routine checks to see whether the next tokens of input match this string. The match must be exact, except that uppercase letters will match their lowercase counterparts; uppercase equivalents are determined by subtracting 'a' − 'A', rather than using the uc_code table, since $T E X$ uses this routine only for its own limited set of keywords.

If a match is found, the characters are effectively removed from the input and true is returned. Otherwise false is returned, and the input is left essentially unchanged (except for the fact that some macros may have been expanded, etc.).

subroutines.c

// look for a given string
bool scan_keyword(char *s) {
    pointer p; // tail of the backup list
    pointer q; // new node being added to the token list via |store_new_token|
    int k; // index into the string |s|
    int len = strlen(s);
    p = BACKUP_HEAD;
    link(p) = null;
    k = 0;
    while (k < len) {
        get_x_token(); // recursion is possible here
        if (cur_cs == 0
            && (cur_chr == s[k] || cur_chr == s[k] - 'a' + 'A'))
        {
            store_new_token(cur_tok);
            incr(k);
        }
        else if (cur_cmd != SPACER || p != BACKUP_HEAD) {
            back_input();
            if (p != BACKUP_HEAD) {
                back_list(link(BACKUP_HEAD));
            }
            return false;
        }
    }
    flush_list(link(BACKUP_HEAD));
    return true;
}

Section 408

Here is a procedure that sounds an alarm when mu and non-mu units are being switched.

error.c

void mu_error() {
    print_err("Incompatible glue units");
    help1("I'm going to assume that 1mu=1pt when they're mixed.");
    error();
}

Section 409

The next routine ‘scan_something_internal’ is used to fetch internal numeric quantities like ‘\hsize’, and also to handle the ‘\the’ when expanding constructions like ‘\the\toks0’ and ‘\the\baselineskip’. Soon we will be considering the scan_int procedure, which calls scan_something_internal; on the other hand, scan_something_internal also calls scan_int, for constructions like ‘\catcode`\$’ or ‘\fontdimen 3 \ff’. So we have to declare scan_int as a forward procedure. A few other procedures are also declared at this point.

subroutines.c

// << Declare procedures that scan restricted classes of integers, 433 >>

// << Declare procedures that scan font-related stuff, 577 >>

Section 410

TeX doesn’t know exactly what to expect when scan_something_internal begins. For example, an integer or dimension or glue value could occur immediately after ‘\hskip’; and one can even say \the with respect to token lists in constructions like ‘\xdef\o{\the\output}’. On the other hand, only integers are allowed after a construction like ‘\count’. To handle the various possibilities, scan_something_internal has a level parameter, which tells the “highest” kind of quantity that scan_something_internal is allowed to produce. Six levels are distinguished, namely INT_VAL, DIMEN_VAL, GLUE_VAL, MU_VAL, IDENT_VAL, and TOK_VAL.

The output of scan_something_internal (and of the other routines scan_int, scan_dimen, and scan_glue below) is put into the global variable cur_val, and its level is put into cur_val_level. The highest values of cur_val_level are special: MU_VAL is used only when cur_val points to something in a “muskip” register, or to one of the three parameters \thinmuskip, \medmuskip, \thickmuskip; IDENT_VAL is used only when cur_val points to a font identifier; TOK_VAL is used only when cur_val points to null or to the reference count of a token list. The last two cases are allowed only when scan_something_internal is called with level = TOK_VAL.

If the output is glue, cur_val will point to a glue specification, and the reference count of that glue will have been updated to reflect this reference; if the output is a nonempty token list, cur_val will point to its reference count, but in this case the count will not have been updated. Otherwise cur_val will contain the integer or scaled value in question.

constants.h

#define INT_VAL   0 // integer values
#define DIMEN_VAL 1 // dimension values
#define GLUE_VAL  2 // glue specifications
#define MU_VAL    3 // math glue specifications
#define IDENT_VAL 4 // font identifier
#define TOK_VAL   5 // token lists

⟨ Global variables 13 ⟩+≡

int cur_val;       // value returned by numeric scanners
int cur_val_level; // the "level" of this value

Section 411

The hash table is initialized with ‘\count’, ‘\dimen’, ‘\skip’, and ‘\muskip’ all having REGISTER as their command code; they are distinguished by the chr_code, which is either INT_VAL, DIMEN_VAL, GLUE_VAL, or MU_VAL.

⟨ Put each of TeX’s primitives into the hash table 226 ⟩+≡

primitive("count", REGISTER, INT_VAL);
primitive("dimen", REGISTER, DIMEN_VAL);
primitive("skip", REGISTER, GLUE_VAL);
primitive("muskip", REGISTER, MU_VAL);

Section 412

⟨ Cases of print_cmd_chr for symbolic printing of primitives 227 ⟩+≡

case REGISTER:
    if (chr_code == INT_VAL) {
        print_esc("count");
    }
    else if (chr_code == DIMEN_VAL) {
        print_esc("dimen");
    }
    else if (chr_code == GLUE_VAL) {
        print_esc("skip");
    }
    else {
        print_esc("muskip");
    }
    break;

Section 413

OK, we’re ready for scan_something_internal itself. A second parameter, negative, is set true if the value that is found should be negated. It is assumed that cur_cmd and cur_chr represent the first token of the internal quantity to be scanned; an error will be signalled if cur_cmd $<$ MIN_INTERNAL or cur_cmd $>$ MAX_INTERNAL.

parser.h

#define scanned_result(X, Y) \
    cur_val = (X);           \
    cur_val_level = (Y)

subroutines.c

// fetch an internal parameter
void scan_something_internal(small_number level, bool negative) {
    halfword m; // |chr_code| part of the operand token
    int p;      // index into |nest|
    m = cur_chr;
    switch (cur_cmd) {
    case DEF_CODE:
        // << Fetch a character code from some table, 414 >>
        break;
    
    case TOKS_REGISTER:
    case ASSIGN_TOKS:
    case DEF_FAMILY:
    case SET_FONT:
    case DEF_FONT:
        // << Fetch a token list or font identifier, provided that |level = TOK_VAL|, 415 >>
        break;
    
    case ASSIGN_INT:
        scanned_result(eqtb[m].integer, INT_VAL);
        break;
    
    case ASSIGN_DIMEN:
        scanned_result(eqtb[m].sc, DIMEN_VAL);
        break;
    
    case ASSIGN_GLUE:
        scanned_result(equiv(m), GLUE_VAL);
        break;
    
    case ASSIGN_MU_GLUE:
        scanned_result(equiv(m), MU_VAL);
        break;
    
    case SET_AUX:
        // << Fetch the |space_factor| or the |prev_depth|, 418 >>
        break;
    
    case SET_PREV_GRAF:
        // << Fetch the |prev_graf|, 422 >>
        break;
    
    case SET_PAGE_INT:
        // << Fetch the |dead_cycles| or the |insert_penalties|, 419 >>
        break;
    
    case SET_PAGE_DIMEN:
        // << Fetch something on the |page_so_far|, 421 >>
        break;
    
    case SET_SHAPE:
        // << Fetch the |par_shape| size, 423 >>
        break;
    
    case SET_BOX_DIMEN:
        // << Fetch a box dimension, 420 >>
        break;
    
    case CHAR_GIVEN:
    case MATH_GIVEN:
        scanned_result(cur_chr, INT_VAL);
        break;
    
    case ASSIGN_FONT_DIMEN:
        // << Fetch a font dimension, 425 >>
        break;
    
    case ASSIGN_FONT_INT:
        // << Fetch a font integer, 426 >>
        break;
    
    case REGISTER:
        // << Fetch a register, 427 >>
        break;
    
    case LAST_ITEM:
        // << Fetch an item in the current node, if appropriate, 424 >>
        break;
    
    default:
        // << Complain that \the can't do this; give zero result, 428 >>
    }
    while (cur_val_level > level) {
        // << Convert |cur_val| to a lower level, 429 >>
    }
    // << Fix the reference count, if any, and negate |cur_val| if |negative|, 430 >>
}

Section 414

⟨ Fetch a character code from some table 414 ⟩≡

scan_char_num();
if (m == MATH_CODE_BASE) {
    scanned_result(ho(math_code(cur_val)), INT_VAL);
}
else if (m < MATH_CODE_BASE) {
    scanned_result(equiv(m + cur_val), INT_VAL);
}
else {
    scanned_result(eqtb[m + cur_val].integer, INT_VAL);
}

Section 415

⟨ Fetch a token list or font identifier, provided that level = TOK_VAL 415 ⟩≡

if (level != TOK_VAL) {
    print_err("Missing number, treated as zero");
    help3("A number should have been here; I inserted `0'.")
        ("(If you can't figure out why I needed to see a number,")
        ("look up `weird error' in the index to The TeXbook.)");
    back_error();
    scanned_result(0, DIMEN_VAL);
}
else if (cur_cmd <= ASSIGN_TOKS) {
    if (cur_cmd < ASSIGN_TOKS) {
        // |cur_cmd == TOKS_REGISTER|
        scan_eight_bit_int();
        m = TOKS_BASE + cur_val;
    }
    scanned_result(equiv(m), TOK_VAL);
}
else {
    back_input();
    scan_font_ident();
    scanned_result(FONT_ID_BASE + cur_val, IDENT_VAL);
}

Section 416

Users refer to ‘\the\spacefactor’ only in horizontal mode, and to ‘\the\prevdepth’ only in vertical mode; so we put the associated mode in the modifier part of the SET_AUX command. The SET_PAGE_INT command has modifier 0 or 1, for ‘\deadcycles’ and ‘\insertpenalties’, respectively. The SET_BOX_DIMEN command is modified by either WIDTH_OFFSET, HEIGHT_OFFSET, or DEPTH_OFFSET. And the LAST_ITEM command is modified by either INT_VAL, DIMEN_VAL, GLUE_VAL, INPUT_LINE_NO_CODE, or BADNESS_CODE.

constants.h

#define INPUT_LINE_NO_CODE (GLUE_VAL + 1) // code for \inputlineno
#define BADNESS_CODE       (GLUE_VAL + 2) // code for \badness

⟨ Put each of TeX’s primitives into the hash table 226 ⟩+≡

primitive("spacefactor", SET_AUX, HMODE);
primitive("prevdepth", SET_AUX, VMODE);
primitive("deadcycles", SET_PAGE_INT, 0);
primitive("insertpenalties", SET_PAGE_INT, 1);
primitive("wd", SET_BOX_DIMEN, WIDTH_OFFSET);
primitive("ht", SET_BOX_DIMEN, HEIGHT_OFFSET);
primitive("dp", SET_BOX_DIMEN, DEPTH_OFFSET);
primitive("lastpenalty", LAST_ITEM, INT_VAL);
primitive("lastkern", LAST_ITEM, DIMEN_VAL);
primitive("lastskip", LAST_ITEM, GLUE_VAL);
primitive("inputlineno", LAST_ITEM, INPUT_LINE_NO_CODE);
primitive("badness", LAST_ITEM, BADNESS_CODE);

Section 417

⟨ Cases of print_cmd_chr for symbolic printing of primitives 227 ⟩+≡

case SET_AUX:
    if (chr_code == VMODE) {
        print_esc("prevdepth");
    }
    else {
        print_esc("spacefactor");
    }
    break;

case SET_PAGE_INT:
    if (chr_code == 0) {
        print_esc("deadcycles");
    }
    else {
        print_esc("insertpenalties");
    }
    break;

case SET_BOX_DIMEN:
    if (chr_code == WIDTH_OFFSET) {
        print_esc("wd");
    }
    else if (chr_code == HEIGHT_OFFSET) {
        print_esc("ht");
    }
    else {
        print_esc("dp");
    }
    break;

case LAST_ITEM:
    switch (chr_code) {
    case INT_VAL:
        print_esc("lastpenalty");
        break;
    
    case DIMEN_VAL:
        print_esc("lastkern");
        break;
    
    case GLUE_VAL:
        print_esc("lastskip");
        break;
    
    case INPUT_LINE_NO_CODE:
        print_esc("inputlineno");
        break;
    
    default:
        print_esc("badness");
    }
    break;

Section 418

⟨ Fetch the space_factor or the prev_depth 418 ⟩≡

if (abs(mode) != m) {
    print_err("Improper ");
    print_cmd_chr(SET_AUX, m);
    help4("You can refer to \\spacefactor only in horizontal mode;")
        ("you can refer to \\prevdepth only in vertical mode; and")
        ("neither of these is meaningful inside \\write. So")
        ("I'm forgetting what you said and using zero instead.");
    error();
    if (level != TOK_VAL) {
        scanned_result(0, DIMEN_VAL);
    }
    else {
        scanned_result(0, INT_VAL);
    }
}
else if (m == VMODE) {
    scanned_result(prev_depth, DIMEN_VAL);
}
else {
    scanned_result(space_factor, INT_VAL);
}

Section 419

⟨ Fetch the dead_cycles or the insert_penalties 419 ⟩≡

if (m == 0) {
    cur_val = dead_cycles;
}
else {
    cur_val = insert_penalties;
}
cur_val_level = INT_VAL;

Section 420

⟨ Fetch a box dimension 420 ⟩≡

scan_eight_bit_int();
if (box(cur_val) == null) {
    cur_val = 0;
}
else {
    cur_val = mem[box(cur_val) + m].sc;
}
cur_val_level = DIMEN_VAL;

Section 421

Inside an \output routine, a user may wish to look at the page totals that were present at the moment when output was triggered.

constants.h

#define MAX_DIMEN 0x3fffffff // 2^{30} - 1

⟨ Fetch something on the page_so_far 421 ⟩≡

if (page_contents == EMPTY && !output_active) {
    if (m == 0) {
        cur_val = MAX_DIMEN;
    }
    else {
        cur_val = 0;
    }
}
else {
    cur_val = page_so_far[m];
}
cur_val_level = DIMEN_VAL;

Section 422

⟨ Fetch the prev_graf 422 ⟩≡

if (mode == 0) {
    scanned_result(0, INT_VAL); // |prev_graf == 0| within \write
}
else {
    nest[nest_ptr] = cur_list;
    p = nest_ptr;
    while (abs(nest[p].mode_field) != VMODE) {
        decr(p);
    }
    scanned_result(nest[p].pg_field, INT_VAL);
}

Section 423

⟨ Fetch the par_shape size 423 ⟩≡

if (par_shape_ptr == null) {
    cur_val = 0;
}
else {
    cur_val = info(par_shape_ptr);
}
cur_val_level = INT_VAL;

Section 424

Here is where \lastpenalty, \lastkern, and \lastskip are implemented. The reference count for \lastskip will be updated later.

We also handle \inputlineno and \badness here, because they are legal in similar contexts.

⟨ Fetch an item in the current node, if appropriate 424 ⟩≡

if (cur_chr > GLUE_VAL) {
    if (cur_chr == INPUT_LINE_NO_CODE) {
        cur_val = line;
    }
    else {
        cur_val = last_badness; // |cur_chr = BADNESS_CODE|
    }
    cur_val_level = INT_VAL;
}
else {
    if (cur_chr == GLUE_VAL) {
        cur_val = ZERO_GLUE;
    }
    else {
        cur_val = 0;
    }
    cur_val_level = cur_chr;
    if (!is_char_node(tail) && mode != 0) {
        switch (cur_chr) {
        case INT_VAL:
            if (type(tail) == PENALTY_NODE) {
                cur_val = penalty(tail);
            }
            break;
        
        case DIMEN_VAL:
            if (type(tail) == KERN_NODE) {
                cur_val = width(tail);
            }
            break;
        
        case GLUE_VAL:
            if (type(tail) == GLUE_NODE) {
                cur_val = glue_ptr(tail);
                if (subtype(tail) == MU_GLUE) {
                    cur_val_level = MU_VAL;
                }
            }
        } // there are no other cases
    }
    else if (mode == VMODE && tail == head) {
        switch (cur_chr) {
        case INT_VAL:
            cur_val = last_penalty;
            break;
        
        case DIMEN_VAL:
            cur_val = last_kern;
            break;

        case GLUE_VAL:
            if (last_glue != MAX_HALFWORD) {
                cur_val = last_glue;
            }
        } // there are no other cases
    }
}

Section 425

⟨ Fetch a font dimension 425 ⟩≡

find_font_dimen(false);
font_info[fmem_ptr].sc = 0;
scanned_result(font_info[cur_val].sc, DIMEN_VAL);

Section 426

⟨ Fetch a font integer 426 ⟩≡

scan_font_ident();
if (m == 0) {
    scanned_result(hyphen_char[cur_val], INT_VAL);
}
else {
    scanned_result(skew_char[cur_val], INT_VAL);
}

Section 427

⟨ Fetch a register 427 ⟩≡

scan_eight_bit_int();
switch (m) {
case INT_VAL:
    cur_val = count(cur_val);
    break;

case DIMEN_VAL:
    cur_val = dimen(cur_val);
    break;

case GLUE_VAL:
    cur_val = skip(cur_val);
    break;

case MU_VAL:
    cur_val = mu_skip(cur_val);
} // there are no other cases
cur_val_level = m;

Section 428

⟨ Complain that \the can’t do this; give zero result 428 ⟩≡

print_err("You can't use `");
print_cmd_chr(cur_cmd, cur_chr);
print("' after ");
print_esc("the");
help1("I'm forgetting what you said and using zero instead.");
error();
if (level != TOK_VAL) {
    scanned_result(0, DIMEN_VAL);
}
else {
    scanned_result(0, INT_VAL);
}

Section 429

When a GLUE_VAL changes to a DIMEN_VAL, we use the width component of the glue; there is no need to decrease the reference count, since it has not yet been increased. When a DIMEN_VAL changes to an INT_VAL, we use scaled points so that the value doesn’t actually change. And when a MU_VAL changes to a GLUE_VAL, the value doesn’t change either.

⟨ Convert cur_val to a lower level 429 ⟩≡

if (cur_val_level == GLUE_VAL) {
    cur_val = width(cur_val);
}
else if (cur_val_level == MU_VAL) {
    mu_error();
}
decr(cur_val_level);

Section 430

If cur_val points to a glue specification at this point, the reference count for the glue does not yet include the reference by cur_val. If negative is true, cur_val_level is known to be $\leq$ MU_VAL.

⟨ Fix the reference count, if any, and negate cur_val if negative 430 ⟩≡

if (negative) {
    if (cur_val_level >= GLUE_VAL) {
        cur_val = new_spec(cur_val);
        // << Negate all three glue components of |cur_val|, 431 >>
    }
    else {
        negate(cur_val);
    }
}
else if (cur_val_level >= GLUE_VAL && cur_val_level <= MU_VAL) {
    add_glue_ref(cur_val);
}

Section 431

⟨ Negate all three glue components of cur_val 431 ⟩≡

negate(width(cur_val));
negate(stretch(cur_val));
negate(shrink(cur_val));

Section 432

Our next goal is to write the scan_int procedure, which scans anything that $T E X$ treats as an integer. But first we might as well look at some simple applications of scan_int that have already been made inside of scan_something_internal.

Section 433

⟨ Declare procedures that scan restricted classes of integers 433 ⟩≡

void scan_eight_bit_int() {
    scan_int();
    if (cur_val < 0 || cur_val > 255) {
        print_err("Bad register code");        
        help2("A register number must be between 0 and 255.")
            ("I changed this one to zero.");
        int_error(cur_val);
        cur_val = 0;
    }
}

Section 434

⟨ Declare procedures that scan restricted classes of integers 433 ⟩+≡

void scan_char_num() {
    scan_int();
    if (cur_val < 0 || cur_val > 255) {
        print_err("Bad character code");
        help2("A character number must be between 0 and 255.")
            ("I changed this one to zero.");
        int_error(cur_val);
        cur_val = 0;
    }
}

Section 435

While we’re at it, we might as well deal with similar routines that will be needed later.

⟨ Declare procedures that scan restricted classes of integers 433 ⟩+≡

void scan_four_bit_int() {
    scan_int();
    if (cur_val < 0 || cur_val > 15) {
        print_err("Bad number");
        help2("Since I expected to read a number between 0 and 15,")
            ("I changed this one to zero.");
        int_error(cur_val);
        cur_val = 0;
    }
}

Section 436

⟨ Declare procedures that scan restricted classes of integers 433 ⟩+≡

void scan_fifteen_bit_int() {
    scan_int();
    if (cur_val < 0 ||  cur_val > 0x7fff) {
        print_err("Bad mathchar");
        help2("A mathchar number must be between 0 and 32767.")
            ("I changed this one to zero.");
        int_error(cur_val);
        cur_val = 0;
    }
}

Section 437

⟨ Declare procedures that scan restricted classes of integers 433 ⟩+≡

void scan_twenty_seven_bit_int() {
    scan_int();
    if (cur_val < 0 || cur_val > 0x7ffffff) {
        print_err("Bad delimiter code");
        help2("A numeric delimiter code must be between 0 and 2^{27} - 1.")
            ("I changed this one to zero.");
        int_error(cur_val);
        cur_val = 0;
    }
}

Section 438

An integer number can be preceded by any number of spaces and ‘+’ or ‘-’ signs. Then comes either a decimal constant (i.e., radix 10), an octal constant (i.e., radix 8, preceded by '), a hexadecimal constant (radix 16, preceded by "), an alphabetic constant (preceded by `), or an internal variable. After scanning is complete, cur_val will contain the answer, which must be at most $2^{31} - 1 = 2147483647$ in absolute value. The value of radix is set to 10, 8, or 16 in the cases of decimal, octal, or hexadecimal constants, otherwise radix is set to zero. An optional space follows a constant.

constants.h

#define OCTAL_TOKEN             (OTHER_TOKEN + '\'') // apostrophe, indicates an octal constant
#define HEX_TOKEN               (OTHER_TOKEN + '"')  // double quote, indicates a hex constant
#define ALPHA_TOKEN             (OTHER_TOKEN + '`')  // reverse apostrophe, precedes alpha constants
#define POINT_TOKEN             (OTHER_TOKEN + '.')  // decimal point
#define CONTINENTAL_POINT_TOKEN (OTHER_TOKEN + ',')  // decimal point, Eurostyle

⟨ Global variables 13 ⟩+≡

small_number radix; // |scan_int| sets this to 8, 10, 16, or zero

Section 439

We initialize the following global variables just in case expand comes into action before any of the basic scanning routines has assigned them a value.

⟨ Set initial values of key variables 21 ⟩+≡

cur_val = 0;
cur_val_level = INT_VAL;
radix = 0;
cur_order = NORMAL;

Section 440

The scan_int routine is used also to scan the integer part of a fraction; for example, the ‘3’ in ‘3.14159’ will be found by scan_int. The scan_dimen routine assumes that cur_tok = POINT_TOKEN after the integer part of such a fraction has been scanned by scan_int, and that the decimal point has been backed up to be scanned again.

subroutines.c

// sets |cur_val| to an integer
void scan_int() {
    bool negative;  // should the answer be negated?
    int m;          // |2^{31} / radix|, the threshold of danger
    small_number d; // the digit just scanned
    bool vacuous;   // have no digits appeared?
    bool ok_so_far; // has an error message been issued?

    radix = 0;
    ok_so_far = true;
    // << Get the next non-blank non-sign token; set |negative| appropriately, 441 >>
    if (cur_tok == ALPHA_TOKEN) {
        // << Scan an alphabetic character code into |cur_val|, 442 >>
    }
    else if (cur_cmd >= MIN_INTERNAL && cur_cmd <= MAX_INTERNAL) {
        scan_something_internal(INT_VAL, false);
    }
    else {
        // << Scan a numeric constant, 444 >>
    }
    if (negative) {
        negate(cur_val);
    }
}

Section 441

⟨ Get the next non-blank non-sign token; set negative appropriately 441 ⟩≡

negative = false;
do {
    // << Get the next non-blank non-call token, 406 >>
    if (cur_tok == OTHER_TOKEN + '-') {
        negative = !negative;
        cur_tok = OTHER_TOKEN + '+';
    }
} while (cur_tok == OTHER_TOKEN + '+');

Section 442

A space is ignored after an alphabetic character constant, so that such constants behave like numeric ones.

⟨ Scan an alphabetic character code into cur_val 442 ⟩≡

get_token(); // suppress macro expansion
if (cur_tok < CS_TOKEN_FLAG) {
    cur_val = cur_chr;
    if (cur_cmd <= RIGHT_BRACE) {
        if (cur_cmd == RIGHT_BRACE) {
            incr(align_state);
        }
        else {
            decr(align_state);
        }
    }
}
else if (cur_tok < CS_TOKEN_FLAG + SINGLE_BASE) {
    cur_val = cur_tok - CS_TOKEN_FLAG - ACTIVE_BASE;
}
else {
    cur_val = cur_tok - CS_TOKEN_FLAG - SINGLE_BASE;
}
if (cur_val > 255) {
    print_err("Improper alphabetic constant");
    help2("A one-character control sequence belongs after a ` mark.")
        ("So I'm essentially inserting \\0 here.");
    cur_val = '0';
    back_error();
}
else {
    // << Scan an optional space, 443 >>
}

Section 443

⟨ Scan an optional space 443 ⟩≡

get_x_token();
if (cur_cmd != SPACER) {
    back_input();
}

Section 444

⟨ Scan a numeric constant 444 ⟩≡

radix = 10;
m = 214748364;
if (cur_tok == OCTAL_TOKEN) {
    radix = 8;
    m = 0x10000000;
    get_x_token();
}
else if (cur_tok == HEX_TOKEN) {
    radix = 16;
    m = 0x8000000;
    get_x_token();
}
vacuous = true;
cur_val = 0;
// << Accumulate the constant until |cur_tok| is not a suitable digit, 445 >>
if (vacuous) {
    // << Express astonishment that no number was here, 446 >>
}
else if (cur_cmd != SPACER) {
    back_input();
}

Section 445

NOTE

INFINITY is already defined in library header math.h, so it is called INFINITY_ here.

constants.h

#define INFINITY_     0x7fffffff           // the largest positive value that TeX knows
#define ZERO_TOKEN    (OTHER_TOKEN + '0')  // zero, the smallest digit
#define A_TOKEN       (LETTER_TOKEN + 'A') // the smallest special hex digit
#define OTHER_A_TOKEN (OTHER_TOKEN + 'A')  // special hex digit of type |OTHER_CHAR|

⟨ Accumulate the constant until cur_tok is not a suitable digit 445 ⟩≡

while(true) {
    if (cur_tok < ZERO_TOKEN + radix
        && cur_tok >= ZERO_TOKEN
        && cur_tok <= ZERO_TOKEN + 9)
    {
        d = cur_tok - ZERO_TOKEN;
    }
    else if (radix == 16) {
        if (cur_tok <= A_TOKEN + 5 && cur_tok >= A_TOKEN) {
            d = cur_tok - A_TOKEN + 10;
        }
        else if (cur_tok <= OTHER_A_TOKEN + 5 && cur_tok >= OTHER_A_TOKEN) {
            d = cur_tok - OTHER_A_TOKEN + 10;
        }
        else {
            break; // Goto done
        }
    }
    else {
        break; // Goto done
    }
    vacuous = false;
    if (cur_val >= m
        && (cur_val > m || d > 7 || radix != 10))
    {
        if (ok_so_far) {
            print_err("Number too big");
            help2("I can only go up to 2147483647='17777777777=\"7FFFFFFF,")
                ("so I'm using that number instead of yours.");
            error();
            cur_val = INFINITY_;
            ok_so_far = false;
        }
    }
    else {
        cur_val = cur_val * radix + d;
    }
    get_x_token();
}
// done:

Section 446

⟨ Express astonishment that no number was here 446 ⟩≡

print_err("Missing number, treated as zero");
help3("A number should have been here; I inserted `0'.")
    ("(If you can't figure out why I needed to see a number,")
    ("look up `weird error' in the index to The TeXbook.)");
back_error();

Section 447

The scan_dimen routine is similar to scan_int, but it sets cur_val to a scaled value, i.e., an integral number of sp. One of its main tasks is therefore to interpret the abbreviations for various kinds of units and to convert measurements to scaled points.

There are three parameters: mu is true if the finite units must be ‘mu’, while mu is false if ‘mu’ units are disallowed; inf is true if the infinite units ‘fil’, ‘fill’, ‘filll’ are permitted; and shortcut is true if cur_val already contains an integer and only the units need to be considered.

The order of infinity that was found in the case of infinite glue is returned in the global variable cur_order.

⟨ Global variables 13 ⟩+≡

int cur_order; // order of infinity found by |scan_dimen|

Section 448

Constructions like ‘-'77 pt’ are legal dimensions, so scan_dimen may begin with scan_int. This explains why it is convenient to use scan_int also for the integer part of a decimal fraction.

Several branches of scan_dimen work with cur_val as an integer and with an auxiliary fraction $f$ , so that the actual quantity of interest is cur_val + $f / 2^{16}$ . At the end of the routine, this “unpacked” representation is put into the single word cur_val, which suddenly switches significance from int to scaled.

parser.h

#define scan_normal_dimen scan_dimen(false, false, false)

subroutines.c

// sets |cur_val| to a dimension
void scan_dimen(bool mu, bool inf, bool shortcut) {
    bool negative; // should the answer be negated?
    int f;         // numerator of a fraction whose denominator is $2^{16}$
    // << Local variables for dimension calculations, 450 >>
    
    f = 0;
    arith_error = false;
    cur_order = NORMAL;
    negative = false;
    if (!shortcut) {
        // << Get the next non-blank non-sign token; set |negative| appropriately, 441 >>
        if (cur_cmd >= MIN_INTERNAL && cur_cmd <= MAX_INTERNAL) {
            // << Fetch an internal dimension and |goto attach_sign|, or fetch an internal integer, 449 >>
        }
        else { 
            back_input();
            if (cur_tok == CONTINENTAL_POINT_TOKEN) {
                cur_tok = POINT_TOKEN;
            }
            if (cur_tok != POINT_TOKEN) {
                scan_int();
            }
            else {
                radix = 10;
                cur_val = 0;
            }
            if (cur_tok == CONTINENTAL_POINT_TOKEN) {
                cur_tok = POINT_TOKEN;
            }
            if (radix == 10 && cur_tok == POINT_TOKEN) {
                // << Scan decimal fraction, 452 >>
            }
        }
    }
    if (cur_val < 0) {
        // in this case |f = 0|
        negative = !negative;
        negate(cur_val);
    }
    
    // << Scan units and set |cur_val| to x * (|cur_val| + f/2^{16}), where there are |x| sp per unit; |goto attach_sign| if the units are internal, 453 >>
    
    // << Scan an optional space, 443 >>

attach_sign:
    if (arith_error || abs(cur_val) >= 0x40000000) {
        // << Report that this dimension is out of range, 460 >>
    }
    if (negative) {
        negate(cur_val);
    }
}

Section 449

⟨ Fetch an internal dimension and goto attach_sign, or fetch an internal integer 449 ⟩≡

if (mu) {
    scan_something_internal(MU_VAL, false);
    // << Coerce glue to a dimension, 451 >>
    if (cur_val_level == MU_VAL) {
        goto attach_sign;
    }
    if (cur_val_level != INT_VAL) {
        mu_error();
    }
}
else {
    scan_something_internal(DIMEN_VAL, false);
    if (cur_val_level == DIMEN_VAL) {
        goto attach_sign;
    }
}

Section 450

⟨ Local variables for dimension calculations 450 ⟩≡

int num, denom;   // conversion ratio for the scanned units
int k, kk;        // number of digits in a decimal fraction
pointer p, q;     // top of decimal digit stack
scaled v;         // an internal dimension
int save_cur_val; // temporary storage of |cur_val|

Section 451

The following code is executed when scan_something_internal was called asking for MU_VAL, when we really wanted a “mudimen” instead of “muglue”.

⟨ Coerce glue to a dimension 451 ⟩≡

if (cur_val_level >= GLUE_VAL) {
    v = width(cur_val);
    delete_glue_ref(cur_val);
    cur_val = v;
}

Section 452

When the following code is executed, we have cur_tok = POINT_TOKEN, but this token has been backed up using back_input; we must first discard it.

It turns out that a decimal point all by itself is equivalent to ‘0.0’. Let’s hope people don’t use that fact.

⟨ Scan decimal fraction 452 ⟩≡

k = 0;
p = null;
get_token(); // |POINT_TOKEN| is being re-scanned
while(true) {
    get_x_token();
    if (cur_tok > ZERO_TOKEN + 9 || cur_tok < ZERO_TOKEN) {
        break; // Goto done1
    }
    if (k < 17) {
        // digits for |k >= 17| cannot affect the result
        q = get_avail();
        link(q) = p;
        info(q) = cur_tok - ZERO_TOKEN;
        p = q;
        incr(k);
    }
}
// done1:
for(kk = k; kk >= 1; kk--) {
    dig[kk - 1] = info(p);
    q = p;
    p = link(p);
    free_avail(q);
}
f = round_decimals(k);
if (cur_cmd != SPACER) {
    back_input();
}

Section 453

Now comes the harder part: At this point in the program, cur_val is a nonnegative integer and $f / 2^{16}$ is a nonnegative fraction less than 1; we want to multiply the sum of these two quantities by the appropriate factor, based on the specified units, in order to produce a scaled result, and we want to do the calculation with fixed point arithmetic that does not overflow.

⟨ Scan units and set cur_val to x * (cur_val + f/2^{16}), where there are x sp per unit; goto attach_sign if the units are internal 453 ⟩≡

if (inf) {
    // << Scan for |fil| units; |goto attach_fraction| if found, 454 >>
}

// << Scan for units that are internal dimensions; |goto attach_sign| with |cur_val| set if found, 455 >>

if (mu) {
    // << Scan for mu units and |goto attach_fraction|, 456 >>
}
if (scan_keyword("true")) {
    // << Adjust for the magnification ratio, 457 >>
}
if (scan_keyword("pt")) {
    goto attach_fraction; // the easy case
}

// << Scan for all other units and adjust |cur_val| and |f| accordingly; |goto done| in the case of scaled points, 458 >>

attach_fraction:
if (cur_val >= 0x4000) {
    arith_error = true;
}
else {
    cur_val = cur_val * UNITY + f;
}
done:

Section 454

A specification like ‘filllll’ or ‘fill L L L’ will lead to two error messages (one for each additional keyword "l").

⟨ Scan for fil units; goto attach_fraction if found 454 ⟩≡

if (scan_keyword("fil")) {
    cur_order = FIL;
    while (scan_keyword("l")) {
        if (cur_order == FILLL) {
            print_err("Illegal unit of measure (");
            print("replaced by filll)");
            help1("I dddon't go any higher than filll.");
            error();
        }
        else {
            incr(cur_order);
        }
    }
    goto attach_fraction;
}

Section 455

⟨ Scan for units that are internal dimensions; goto attach_sign with cur_val set if found 455 ⟩≡

save_cur_val = cur_val;
// << Get the next non-blank non-call token, 406 >>
if (cur_cmd < MIN_INTERNAL || cur_cmd > MAX_INTERNAL) {
    back_input();
}
else {
    if (mu) {
        scan_something_internal(MU_VAL, false);
        // << Coerce glue to a dimension, 451 >>
        if (cur_val_level != MU_VAL) {
            mu_error();
        }
    }
    else {
        scan_something_internal(DIMEN_VAL, false);
    }
    v = cur_val;
    goto found;
}
if (mu) {
    goto not_found;
}
if (scan_keyword("em")) {
    // << The em width for |cur_font|, 558 >>
}
else if (scan_keyword("ex")) {
    // << The x-height for |cur_font|, 559 >>
}
else {
    goto not_found;
}
// << Scan an optional space, 443 >>

found:
cur_val = nx_plus_y(save_cur_val, v, xn_over_d(v, f, 0x10000));
goto attach_sign;
not_found:

Section 456

⟨ Scan for mu units and goto attach_fraction 456 ⟩≡

if (scan_keyword("mu")) {
    goto attach_fraction;
}
else {
    print_err("Illegal unit of measure (");
    print("mu inserted)");
    help4("The unit of measurement in math glue must be mu.")
        ("To recover gracefully from this error, it's best to")
        ("delete the erroneous units; e.g., type `2' to delete")
        ("two letters. (See Chapter 27 of The TeXbook.)");
    error();
    goto attach_fraction;
}

Section 457

⟨ Adjust for the magnification ratio 457 ⟩≡

prepare_mag();
if (mag != 1000) {
    cur_val = xn_over_d(cur_val, 1000, mag);
    f = (1000*f + 0x10000 * remainder_) / mag;
    cur_val += f / 0x10000;
    f %= 0x10000;
}

Section 458

The necessary conversion factors can all be specified exactly as fractions whose numerator and denominator sum to 32768 or less. According to the definitions here, 2660 dd $\approx$ 1000.33297 mm; this agrees well with the value 1000.333 mm cited by Bosshard in Technische Grundlagen zur Satzherstellung (Bern, 1980).

parser.h

#define set_conversion(X, Y) num = (X); denom = (Y)

⟨ Scan for all other units and adjust cur_val and f accordingly; goto done in the case of scaled points 458 ⟩≡

if (scan_keyword("in")) {
    set_conversion(7227, 100);
}
else if (scan_keyword("pc")) {
    set_conversion(12, 1);
}
else if (scan_keyword("cm")) {
    set_conversion(7227, 254);
}
else if (scan_keyword("mm")) {
    set_conversion(7227, 2540);
}
else if (scan_keyword("bp")) {
    set_conversion(7227, 7200);
}
else if (scan_keyword("dd")) {
    set_conversion(1238, 1157);
}
else if (scan_keyword("cc")) {
    set_conversion(14856, 1157);
}
else if (scan_keyword("sp")) {
    goto done;
}
else {
    // << Complain about unknown unit and |goto done2|, 459 >>
}
cur_val = xn_over_d(cur_val, num, denom);
f = (num * f + 0x10000 * remainder_) / denom;
cur_val += f / 0x10000;
f %= 0x10000;
done2:

Section 459

⟨ Complain about unknown unit and goto done2 459 ⟩≡

print_err("Illegal unit of measure (");
print("pt inserted)");
help6("Dimensions can be in units of em, ex, in, pt, pc,")
    ("cm, mm, dd, cc, bp, or sp; but yours is a new one!")
    ("I'll assume that you meant to say pt, for printer's points.")
    ("To recover gracefully from this error, it's best to")
    ("delete the erroneous units; e.g., type `2' to delete")
    ("two letters. (See Chapter 27 of The TeXbook.)");
error();
goto done2;

Section 460

⟨ Report that this dimension is out of range 460 ⟩≡

print_err("Dimension too large");
help2("I can't work with sizes bigger than about 19 feet.")
    ("Continue and I'll use the largest value I can.");
error();
cur_val = MAX_DIMEN;
arith_error = false;

Section 461

The final member of $T E X$ ’s value-scanning trio is scan_glue, which makes cur_val point to a glue specification. The reference count of that glue spec will take account of the fact that cur_val is pointing to it.

The level parameter should be either GLUE_VAL or MU_VAL.

Since scan_dimen was so much more complex than scan_int, we might expect scan_glue to be even worse. But fortunately, it is very simple, since most of the work has already been done.

subroutines.c

// sets |cur_val| to a glue spec pointer
void scan_glue(small_number level) {
    bool negative; // should the answer be negated?
    pointer q;     // new glue specification
    bool mu;       // does |level = MU_VAL|?

    mu = (level == MU_VAL);
    // << Get the next non-blank non-sign token; set |negative| appropriately, 441 >>
    if (cur_cmd >= MIN_INTERNAL && cur_cmd <= MAX_INTERNAL) {
        scan_something_internal(level, negative);
        if (cur_val_level >= GLUE_VAL) {
            if (cur_val_level != level) {
                mu_error();
            }
            return;
        }
        if (cur_val_level == INT_VAL) {
            scan_dimen(mu, false, true);
        }
        else if (level == MU_VAL) {
            mu_error();
        }
    }
    else {
        back_input();
        scan_dimen(mu, false, false);
        if (negative) {
            negate(cur_val);
        }
    }
    // << Create a new glue specification whose width is |cur_val|; scan for its stretch and shrink components, 462 >>
}

Section 462

⟨ Create a new glue specification whose width is cur_val; scan for its stretch and shrink components 462 ⟩≡

q = new_spec(ZERO_GLUE);
width(q) = cur_val;
if (scan_keyword("plus")) {
    scan_dimen(mu, true, false);
    stretch(q) = cur_val;
    stretch_order(q) = cur_order;
}
if (scan_keyword("minus")) {
    scan_dimen(mu, true, false);
    shrink(q) = cur_val;
    shrink_order(q) = cur_order;
}
cur_val = q;

Section 463

Here’s a similar procedure that returns a pointer to a rule node. This routine is called just after $T E X$ has seen \hrule or \vrule; therefore cur_cmd will be either hrule or vrule. The idea is to store the default rule dimensions in the node, then to override them if ‘height’ or ‘width’ or ‘depth’ specifications are found (in any order).

constants.h

#define DEFAULT_RULE 26214 // 0.4 pt

subroutines.c

pointer scan_rule_spec() {
    pointer q; // the rule node being created
    q = new_rule(); // |width|, |depth|, and |height| all equal |NULL_FLAG| now
    if (cur_cmd == VRULE) {
        width(q) = DEFAULT_RULE;
    }
    else {
        height(q) = DEFAULT_RULE;
        depth(q) = 0;
    }
reswitch:
    if (scan_keyword("width")) {
        scan_normal_dimen;
        width(q) = cur_val;
        goto reswitch;
    }
    if (scan_keyword("height")) {
        scan_normal_dimen;
        height(q) = cur_val;
        goto reswitch;
    }
    if (scan_keyword("depth")) {
        scan_normal_dimen;
        depth(q) = cur_val;
        goto reswitch;
    }
    return q;
}

TeX in C