Section 1340: Extensions

The program above includes a bunch of “hooks” that allow further capabilities to be added without upsetting ’s basic structure. Most of these hooks are concerned with “whatsit” nodes, which are intended to be used for special purposes; whenever a new extension to involves a new kind of whatsit node, a corresponding change needs to be made to the routines below that deal with such nodes, but it will usually be unnecessary to make many changes to the other parts of this program.

In order to demonstrate how extensions can be made, we shall treat ‘\write’, ‘\openout’, ‘\closeout’, ‘\immediate’, ‘\special’, and ‘\setlanguage’ as if they were extensions. These commands are actually primitives of , and they should appear in all implementations of the system; but let’s try to imagine that they aren’t. Then the program below illustrates how a person could add them.

Sometimes, of course, an extension will require changes to itself; no system of hooks could be complete enough for all conceivable extensions. The features associated with ‘\write’ are almost all confined to the following paragraphs, but there are small parts of the print_ln and print_char procedures that were introduced specifically to \write characters. Furthermore one of the token lists recognized by the scanner is a WRITE_TEXT; and there are a few other miscellaneous places where we have already provided for some aspect of \write. The goal of a extender should be to minimize alterations to the standard parts of the program, and to avoid them completely if possible. He or she should also be quite sure that there’s no easy way to accomplish the desired goals with the standard features that already has. “Think thrice before extending”, because that may save a lot of work, and it will also keep incompatible extensions of from proliferating.

Section 1341

First let’s consider the format of whatsit nodes that are used to represent the data associated with \write and its relatives. Recall that a whatsit has type = WHATSIT_NODE, and the subtype is supposed to distinguish different kinds of whatsits. Each node occupies two or more words; the exact number is immaterial, as long as it is readily determined from the subtype or other data.

We shall introduce five subtype values here, corresponding to the control sequences \openout, \write, \closeout, \special, and \setlanguage. The second word of I/O whatsits has a write_stream field that identifies the write-stream number (0 to 15, or 16 for out-of-range and positive, or 17 for out-of-range and negative). In the case of \write and \special, there is also a field that points to the reference count of a token list that should be sent. In the case of \openout, we need three words and three auxiliary subfields to hold the string numbers for name, area, and extension.

constants.h
#define WRITE_NODE_SIZE 2 // number of words in a write/whatsit node
#define OPEN_NODE_SIZE  3 // number of words in an open/whatsit node
#define OPEN_NODE       0 // |subtype| in whatsits that represent files to \openout
#define WRITE_NODE      1 // |subtype| in whatsits that represent things to \write
#define CLOSE_NODE      2 // |subtype| in whatsits that represent streams to \closeout
#define SPECIAL_NODE    3 // |subtype| in whatsits that represent \special things
#define LANGUAGE_NODE   4 // |subtype| in whatsits that change the current language
extensions.h
// << Start file |extensions.h|, 1381 >>

#define what_lang(X)    link((X) + 1)    // language number, in the range |0 .. 255|
#define what_lhm(X)     type((X) + 1)    // minimum left fragment, in the range |1 .. 63|
#define what_rhm(X)     subtype((X) + 1) // minimum right fragment, in the range |1 .. 63|
#define write_tokens(X) link((X) + 1)    // reference count of token list to write
#define write_stream(X) info((X) + 1)    // stream number (0 to 17)
#define open_name(X)    link((X) + 1)    // string number of file name to open
#define open_area(X)    info((X) + 2)    // string number of file area for |open_name|
#define open_ext(X)     link((X) + 2)    // string number of file extension for |open_name|

Section 1342

The sixteen possible \write streams are represented by the write_file array. The jth file is open if and only if write_open[j] = true. The last two streams are special; write_open[16] represents a stream number greater than 15, while write_open[17] represents a negative stream number, and both of these variables are always false.

⟨ Global variables 13 ⟩+≡

alpha_file write_file[16];
bool write_open[18];

Section 1343

⟨ Set initial values of key variables 21 ⟩+≡

for(k = 0; k <= 17; k++) {
    write_open[k] = false;
}

Section 1344

Extensions might introduce new command codes; but it’s best to use EXTENSION with a modifier, whenever possible, so that main_control stays the same.

constants.h
#define IMMEDIATE_CODE    4 // command modifier for \immediate
#define SET_LANGUAGE_CODE 5 // command modifier for \setlanguage

⟨ Put each of TeX’s primitives into the hash table 226 ⟩+≡

primitive("openout", EXTENSION, OPEN_NODE);
primitive("write", EXTENSION, WRITE_NODE);
write_loc = cur_val;
primitive("closeout", EXTENSION, CLOSE_NODE);
primitive("special", EXTENSION, SPECIAL_NODE);
primitive("immediate", EXTENSION, IMMEDIATE_CODE);
primitive("setlanguage", EXTENSION, SET_LANGUAGE_CODE);

Section 1345

The variable write_loc just introduced is used to provide an appropriate error message in case of “runaway” write texts.

⟨ Global variables 13 ⟩+≡

pointer write_loc; // |eqtb| address of \write

Section 1346

⟨ Cases of print_cmd_chr for symbolic printing of primitives 227 ⟩+≡

case EXTENSION:
    switch (chr_code) {
    case OPEN_NODE:
        print_esc("openout");
        break;
    
    case WRITE_NODE:
        print_esc("write");
        break;
    
    case CLOSE_NODE:
        print_esc("closeout");
        break;
    
    case SPECIAL_NODE:
        print_esc("special");
        break;
    
    case IMMEDIATE_CODE:
        print_esc("immediate");
        break;
    
    case SET_LANGUAGE_CODE:
        print_esc("setlanguage");
        break;
    
    default:
        print("[unknown extension!]");
    }
    break;

Section 1347

When an EXTENSION command occurs in main_control, in any mode, the do_extension routine is called.

⟨ Cases of main_control that are for extensions to TeX 1347 ⟩≡

any_mode(EXTENSION):
    do_extension();
    break;

Section 1348

extensions.c
// << Start file |extensions.c|, 1382 >>

// << Declare procedures needed in |do_extension|, 1349 >>

void do_extension() {
    int k; // all-purpose integer
    pointer p; // all-purpose pointer
    switch (cur_chr) {
    case OPEN_NODE:
        // << Implement \openout, 1351 >>
        break;
    
    case WRITE_NODE:
        // << Implement \write, 1352 >>
        break;
    
    case CLOSE_NODE:
        // << Implement \closeout, 1353 >>
        break;
    
    case SPECIAL_NODE:
        // << Implement \special, 1354 >>
        break;
    
    case IMMEDIATE_CODE:
        // << Implement \immediate, 1375 >>
        break;
    
    case SET_LANGUAGE_CODE:
        // << Implement \setlanguage, 1377 >>
        break;
    
    default:
        confusion("ext1");
    }
}

Section 1349

Here is a subroutine that creates a whatsit node having a given subtype and a given number of words. It initializes only the first word of the whatsit, and appends it to the current list.

⟨ Declare procedures needed in do_extension 1349 ⟩≡

void new_whatsit(small_number s, small_number w) {
    pointer p; // the new node
    p = get_node(w);
    type(p) = WHATSIT_NODE;
    subtype(p) = s;
    link(tail) = p;
    tail = p;
}

Section 1350

The next subroutine uses cur_chr to decide what sort of whatsit is involved, and also inserts a write_stream number.

⟨ Declare procedures needed in do_extension 1349 ⟩+≡

void new_write_whatsit(small_number w) {
    new_whatsit(cur_chr, w);
    if (w != WRITE_NODE_SIZE) {
        scan_four_bit_int();
    }
    else {
        scan_int();
        if (cur_val < 0) {
            cur_val = 17;
        }
        else if (cur_val > 15) {
            cur_val = 16;
        }
    }
    write_stream(tail) = cur_val;
}

Section 1351

⟨ Implement \openout 1351 ⟩≡

new_write_whatsit(OPEN_NODE_SIZE);
scan_optional_equals();
scan_file_name();
open_name(tail) = cur_name;
open_area(tail) = cur_area;
open_ext(tail) = cur_ext;

Section 1352

When ‘\write 12{...}’ appears, we scan the token list ‘{...}’ without expanding its macros; the macros will be expanded later when this token list is rescanned.

⟨ Implement \write 1352 ⟩≡

k = cur_cs;
new_write_whatsit(WRITE_NODE_SIZE);
cur_cs = k;
p = scan_toks(false, false);
write_tokens(tail) = def_ref;

Section 1353

⟨ Implement \closeout 1353 ⟩≡

new_write_whatsit(WRITE_NODE_SIZE);
write_tokens(tail) = null;

Section 1354

When ‘\special{...}’ appears, we expand the macros in the token list as in \xdef and \mark.

⟨ Implement \special 1354 ⟩≡

new_whatsit(SPECIAL_NODE, WRITE_NODE_SIZE);
write_stream(tail) = null;
p = scan_toks(false, true);
write_tokens(tail) = def_ref;

Section 1355

Each new type of node that appears in our data structure must be capable of being displayed, copied, destroyed, and so on. The routines that we need for write-oriented whatsits are somewhat like those for mark nodes; other extensions might, of course, involve more subtlety here.

other_printing.c
void print_write_whatsit(char *s, pointer p) {
    print_esc(s);
    if (write_stream(p) < 16) {
        print_int(write_stream(p));
    }
    else if (write_stream(p) == 16) {
        print_char('*');
    }
    else {
        print_char('-');
    }
}

Section 1356

⟨ Display the whatsit node p 1356 ⟩≡

switch (subtype(p)) {
case OPEN_NODE:
    print_write_whatsit("openout", p);
    print_char('=');
    print_file_name(open_name(p), open_area(p), open_ext(p));
    break;

case WRITE_NODE:
    print_write_whatsit("write", p);
    print_mark(write_tokens(p));
    break;

case CLOSE_NODE:
    print_write_whatsit("closeout", p);
    break;

case SPECIAL_NODE:
    print_esc("special");
    print_mark(write_tokens(p));
    break;

case LANGUAGE_NODE:
    print_esc("setlanguage");
    print_int(what_lang(p));
    print(" (hyphenmin ");
    print_int(what_lhm(p));
    print_char(',');
    print_int(what_rhm(p));
    print_char(')');
    break;

default:
    print("whatsit?");
}

Section 1357

⟨ Make a partial copy of the whatsit node p and make r point to it; set words to the number of initial words not yet copied 1357 ⟩≡

switch (subtype(p)) {
case OPEN_NODE:
    r = get_node(OPEN_NODE_SIZE);
    words = OPEN_NODE_SIZE;
    break;

case WRITE_NODE:
case SPECIAL_NODE:
    r = get_node(WRITE_NODE_SIZE);
    add_token_ref(write_tokens(p));
    words = WRITE_NODE_SIZE;
    break;

case CLOSE_NODE:
case LANGUAGE_NODE:
    r = get_node(SMALL_NODE_SIZE);
    words = SMALL_NODE_SIZE;
    break;

default:
    confusion("ext2");
}

Section 1358

⟨ Wipe out the whatsit node p and goto done 1358 ⟩≡

switch (subtype(p)) {
case OPEN_NODE:
    free_node(p, OPEN_NODE_SIZE);
    break;

case WRITE_NODE:
case SPECIAL_NODE:
    delete_token_ref(write_tokens(p));
    free_node(p, WRITE_NODE_SIZE);
    goto done;

case CLOSE_NODE:
case LANGUAGE_NODE:
    free_node(p, SMALL_NODE_SIZE);
    break;

default:
    confusion("ext3");
}
goto done;

Section 1359

⟨ Incorporate a whatsit node into a vbox 1359 ⟩≡

do_nothing;

Section 1360

⟨ Incorporate a whatsit node into an hbox 1360 ⟩≡

do_nothing;

Section 1361

⟨ Let d be the width of the whatsit p 1361 ⟩≡

d = 0;

Section 1362

extensions.h
#define adv_past(X)                          \
    do {                                     \
        if (subtype((X)) == LANGUAGE_NODE) { \
            cur_lang = what_lang((X));       \
            l_hyf = what_lhm((X));           \
            r_hyf = what_rhm((X));           \
        }                                    \
    } while(0)

⟨ Advance past a whatsit node in the line_break loop 1362 ⟩≡

adv_past(cur_p);

Section 1363

⟨ Advance past a whatsit node in the pre-hyphenation loop 1363 ⟩≡

adv_past(s);

Section 1364

⟨ Prepare to move whatsit p to the current page, then goto contribute 1364 ⟩≡

goto contribute;

Section 1365

⟨ Process whatsit p in vert_break loop, goto not_found 1365 ⟩≡

goto not_found;

Section 1366

⟨ Output the whatsit node p in a vlist 1366 ⟩≡

out_what(p);

Section 1367

⟨ Output the whatsit node p in an hlist 1367 ⟩≡

out_what(p);

Section 1368

After all this preliminary shuffling, we come finally to the routines that actually send out the requested data. Let’s do \special first (it’s easier).

⟨ Declare procedures needed in hlist_out, vlist_out 1368 ⟩≡

void special_out(pointer p) {
    int old_setting; // holds print |selector|
    int k;           // index into |str_pool|
    synch_h;
    synch_v;
    old_setting = selector;
    selector = NEW_STRING;
    show_token_list(link(write_tokens(p)), null, POOL_SIZE - pool_ptr);
    selector = old_setting;
    str_room(1);
    if (cur_length < 256) {
        dvi_out(XXX1);
        dvi_out(cur_length);
    }
    else {
        dvi_out(XXX4);
        dvi_four(cur_length);
    }
    for(k = str_start[str_ptr]; k <= pool_ptr - 1; k++) {
        dvi_out(str_pool[k]);
    }
    pool_ptr = str_start[str_ptr]; // erase the string
}

Section 1369

To write a token list, we must run it through ’s scanner, expanding macros and \the and \number, etc. This might cause runaways, if a delimited macro parameter isn’t matched, and runaways would be extremely confusing since we are calling on ’s scanner in the middle of a \shipout command. Therefore we will put a dummy control sequence as a “stopper”, right after the token list. This control sequence is artificially defined to be \outer.

NOTE

String "endwrite" must be added to the pool.

⟨ Read the other strings 51 ⟩+≡

put_string("endwrite"); // ENDWRITE_STRING: 271

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define ENDWRITE_STRING 271

⟨ Initialize table entries (done by INITEX only) 164 ⟩+≡

text(END_WRITE) = ENDWRITE_STRING; // "endwrite"
eq_level(END_WRITE) = LEVEL_ONE;
eq_type(END_WRITE) = OUTER_CALL;
equiv(END_WRITE) = null;

Section 1370

⟨ Declare procedures needed in hlist_out, vlist_out 1368 ⟩+≡

void write_out(pointer p) {
    int old_setting; // holds print |selector|
    int old_mode;    // saved |mode|
    small_number j;  // write stream number
    pointer q, r;    // temporary variables for list manipulation
    
    // << Expand macros in the token list and make |link(def_ref)| point to the result, 1371 >>
    old_setting = selector;
    j = write_stream(p);
    if (write_open[j]) {
        selector = j;
    }
    else {
        // write to the terminal if file isn't open
        if (j == 17 && selector == TERM_AND_LOG) {
            selector = LOG_ONLY;
        }
        print_nl("");
    }
    token_show(def_ref);
    print_ln();
    flush_list(def_ref);
    selector = old_setting;
}

Section 1371

The final line of this routine is slightly subtle; at least, the author didn’t think about it until getting burnt! There is a used-up token list on the stack, namely the one that contained END_WRITE_TOKEN. (We insert this artificial ‘\endwrite’ to prevent runaways, as explained above.) If it were not removed, and if there were numerous writes on a single page, the stack would overflow.

constants.h
#define END_WRITE_TOKEN (CS_TOKEN_FLAG + END_WRITE)

⟨ Expand macros in the token list and make link(def_ref) point to the result 1371 ⟩≡

q = get_avail();
info(q) = RIGHT_BRACE_TOKEN + '}';
r = get_avail();
link(q) = r;
info(r) = END_WRITE_TOKEN;
ins_list(q);
begin_token_list(write_tokens(p), WRITE_TEXT);
q = get_avail();
info(q) = LEFT_BRACE_TOKEN + '{';
ins_list(q);
// now we're ready to scan '{<token list>} \endwrite'
old_mode = mode;
mode = 0; // disable \prevdepth, \spacefactor, \lastskip, \prevgraf
cur_cs = write_loc;
q = scan_toks(false, true); // expand macros, etc.
get_token();
if (cur_tok != END_WRITE_TOKEN) {
    // << Recover from an unbalanced write command, 1372 >>
}
mode = old_mode;
end_token_list(); // conserve stack space

Section 1372

⟨ Recover from an unbalanced write command 1372 ⟩≡

print_err("Unbalanced write command");
help2("On this page there's a \\write with fewer real {'s than }'s.")
    ("I can't handle that very well; good luck.");
error();
do {
    get_token();
} while (cur_tok != END_WRITE_TOKEN);

Section 1373

The out_what procedure takes care of outputting whatsit nodes for vlist_out and hlist_out.

⟨ Declare procedures needed in hlist_out, vlist_out 1368 ⟩+≡

void out_what(pointer p) {
    small_number j; // write stream number
    switch (subtype(p)) {
    case OPEN_NODE:
    case WRITE_NODE:
    case CLOSE_NODE:
        // << Do some work that has been queued up for \write, 1374 >>
        break;
    
    case SPECIAL_NODE:
        special_out(p);
        break;
    
    case LANGUAGE_NODE:
        do_nothing;
        break;
    
    default:
        confusion("ext4");
    }
}

Section 1374

We don’t implement \write inside of leaders. (The reason is that the number of times a leader box appears might be different in different implementations, due to machine-dependent rounding in the glue calculations.)

⟨ Do some work that has been queued up for \write 1374 ⟩≡

if (!doing_leaders) {
    j = write_stream(p);
    if (subtype(p) == WRITE_NODE) {
        write_out(p);
    }
    else {
        if (write_open[j]) {
            a_close(write_file[j]);
        }
        if (subtype(p) == CLOSE_NODE) {
            write_open[j] = false;
        }
        else if (j < 16) {
            cur_name = open_name(p);
            cur_area = open_area(p);
            cur_ext = open_ext(p);
            if (cur_ext == EMPTY_STRING) {
                cur_ext = TEX_EXT; // ".tex"
            }
            pack_cur_name;
            while (!a_open_out(&write_file[j])) {
                prompt_file_name("output file name", TEX_EXT);
            }
            write_open[j] = true;
        }
    }
}

Section 1375

The presence of ‘\immediate’ causes the do_extension procedure to descend to one level of recursion. Nothing happens unless \immediate is followed by ‘\openout’, ‘\write’, or ‘\closeout’.

⟨ Implement \immediate 1375 ⟩≡

get_x_token();
if (cur_cmd == EXTENSION && cur_chr <= CLOSE_NODE) {
    p = tail;
    do_extension(); // append a whatsit node
    out_what(tail); // do the action immediately
    flush_node_list(tail);
    tail = p;
    link(p) = null;
}
else {
    back_input();
}

Section 1376

The \language extension is somewhat different. We need a subroutine that comes into play when a character of a non-clang language is being appended to the current paragraph.

⟨ Declare action procedures for use by main_control 1043 ⟩+≡

void fix_language() {
    ASCII_code l; // the new current language
    if (language <= 0) {
        l = 0;
    }
    else if (language > 255) {
        l = 0;
    }
    else {
        l = language;
    }
    if (l != clang) {
        new_whatsit(LANGUAGE_NODE, SMALL_NODE_SIZE);
        what_lang(tail) = l;
        clang = l;
        what_lhm(tail) = norm_min(left_hyphen_min);
        what_rhm(tail) = norm_min(right_hyphen_min);
    }
}

Section 1377

⟨ Implement \setlanguage 1377 ⟩≡

if (abs(mode) != HMODE) {
    report_illegal_case();
}
else {
    new_whatsit(LANGUAGE_NODE, SMALL_NODE_SIZE);
    scan_int();
    if (cur_val <= 0) {
        clang = 0;
    }
    else if (cur_val > 255) {
        clang = 0;
    }
    else {
        clang = cur_val;
    }
    what_lang(tail) = clang;
    what_lhm(tail) = norm_min(left_hyphen_min);
    what_rhm(tail) = norm_min(right_hyphen_min);
}

Section 1378

⟨ Finish the extensions 1378 ⟩≡

for(k = 0; k <= 15; k++) {
    if (write_open[k]) {
        a_close(write_file[k]);
    }
}