Section 511: File names

It’s time now to fret about file names. Besides the fact that different operating systems treat files in different ways, we must cope with the fact that completely different naming conventions are used by different groups of people. The following programs show what is required for one particular operating system; similar routines for other systems are not difficult to devise.

assumes that a file name has three parts: the name proper; its “extension”; and a “file area” where it is found in an external file system. The extension of an input file or a write file is assumed to be ‘.tex’ unless otherwise specified; it is ‘.log’ on the transcript file that records each run of ; it is ‘.tfm’ on the font metric files that describe characters in the fonts uses; it is ‘.dvi’ on the output files that specify typesetting information; and it is ‘.fmt’ on the format files written by INITEX to initialize . The file area can be arbitrary on input files, but files are usually output to the user’s current area. If an input file cannot be found on the specified area, will look for it on a special system area; this special area is intended for commonly used input files like webmac.tex.

Simple uses of refer only to file names that have no explicit extension or area. For example, a person usually says ‘\input paper’ or ‘\font\tenrm = helvetica’ instead of ‘\input paper.new’ or ‘\font\tenrm = <csd.knuth>test’. Simple file names are best, because they make the source files portable; whenever a file name consists entirely of letters and digits, it should be treated in the same way by all implementations of . However, users need the ability to refer to other files in their environment, especially when responding to error messages concerning unopenable files; therefore we want to let them use the syntax that appears in their favorite operating system.

The following procedures don’t allow spaces to be part of file names; but some users seem to like names that are spaced-out. System-dependent changes to allow such things should probably be made with reluctance, and only when an entire file name that includes spaces is “quoted” somehow.

Section 512

In order to isolate the system-dependent aspects of file names, the system-independent parts of are expressed in terms of three system-dependent procedures called begin_name, more_name, and end_name. In essence, if the user-specified characters of the file name are , the system-independent driver program does the operations

begin_name; more_name(); ; more_name(); end_name.

These three procedures communicate with each other via global variables. Afterwards the file name will appear in the string pool as three strings called cur_name, cur_area, and cur_ext; the latter two are null (i.e., ""), unless they were explicitly specified by the user.

Actually the situation is slightly more complicated, because needs to know when the file name ends. The more_name routine is a function (with side effects) that returns true on the calls more_name, …, more_name. The final call more_name returns false; or, it returns true and the token following is something like ‘\hbox’ (i.e., not a character). In other words, more_name is supposed to return true unless it is sure that the file name has been completely scanned; and end_name is supposed to be able to finish the assembly of cur_name, cur_area, and cur_ext regardless of whether more_name returned true or false.

⟨ Global variables 13 ⟩+≡

str_number cur_name; // name of file just scanned
str_number cur_area; // file area just scanned, or ""
str_number cur_ext;  // file extension just scanned, or ""

Section 513

The file names we shall deal with for illustrative purposes have the following structure: If the name contains ‘>’ or ‘:’, the file area consists of all characters up to and including the final such character; otherwise the file area is null. If the remaining file name contains ‘.’, the file extension consists of all such characters from the first remaining ‘.’ to the end, otherwise the file extension is null.

We can scan such file names easily by using two global variables that keep track of the occurrences of area and extension delimiters:

NOTE

This C implementation of uses only ‘/’ as the area delimiter.

⟨ Global variables 13 ⟩+≡

pool_pointer area_delimiter; // the most recent '>' or ':', if any
pool_pointer ext_delimiter;  // the relevant '.', if any

Section 514

Input files that can’t be found in the user’s area may appear in a standard system area called TEX_AREA. Font metric files whose areas are not given explicitly are assumed to appear in a standard system area called TEX_FONT_AREA. These system area names will, of course, vary from place to place.

NOTE

Those strings are added to the pool (with ‘/’ as separator instead of ‘:’).

⟨ Read the other strings 51 ⟩+≡

put_string("TeXinputs/"); // TEX_AREA: 258
put_string("TeXfonts/");  // TEX_FONT_AREA: 259

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define TEX_AREA 258
#define TEX_FONT_AREA 259

Section 515

Here now is the first of the system-dependent routines for file name scanning.

filenames.c
// << Start file |filenames.c|, 1382 >>

void begin_name() {
    area_delimiter = 0;
    ext_delimiter = 0;
}

Section 516

And here’s the second. The string pool might change as the file name is being scanned, since a new \csname might be entered; therefore we keep area_delimiter and ext_delimiter relative to the beginning of the current string, instead of assigning an absolute address like pool_ptr to them.

filenames.c
bool more_name(ASCII_code c) {
    if (c == ' ') {
        return false;
    }
    else {
        str_room(1);
        append_char(c); // contribute |c| to the current string
        if (c == '/') {
            area_delimiter = cur_length;
            ext_delimiter = 0;
        }
        else if (c == '.' && ext_delimiter == 0) {
            ext_delimiter = cur_length;
        }
        return true;
    }
}

Section 517

The third.

filenames.c
void end_name() {
    if (str_ptr + 3 > MAX_STRINGS) {
        overflow("number of strings", MAX_STRINGS - init_str_ptr);
    }
    if (area_delimiter == 0) {
        cur_area = EMPTY_STRING;
    }
    else {
        cur_area = str_ptr;
        str_start[str_ptr + 1] = str_start[str_ptr] + area_delimiter;
        incr(str_ptr);
    }
    if (ext_delimiter == 0) {
        cur_ext = EMPTY_STRING;
        cur_name = make_string();
    }
    else {
        cur_name = str_ptr;
        str_start[str_ptr + 1] = str_start[str_ptr] + ext_delimiter - area_delimiter - 1;
        incr(str_ptr);
        cur_ext = make_string();
    }
}

Section 518

Conversely, here is a routine that takes three strings and prints a file name that might have produced them. (The routine is system dependent, because some operating systems put the file area last instead of first.)

basic_printing.c
void print_file_name(int n, int a, int e) {
    slow_print(a);
    slow_print(n);
    slow_print(e);
}

Section 519

Another system-dependent routine is needed to convert three internal strings into the name_of_file value that is used to open files. The present code allows both lowercase and uppercase letters in the file name.

NOTE

A macro append_to_name_nul is defined to add null byte at the end of the name. Writing spaces after the name has been removed.

parser.h
#define append_to_name(X)              \
    do {                               \
        c = (X);                       \
        if (k < FILE_NAME_SIZE) {      \
            name_of_file[k] = XCHR[c]; \
        }                              \
        incr(k);                       \
    } while (0)

#define append_to_name_nul name_of_file[name_length] = '\0'
filenames.c
void pack_file_name(str_number n, str_number a, str_number e) {
    int k; // number of positions filled in |name_of_file|
    ASCII_code c; // character being packed
    int j; // index into |str_pool|
    k = 0;
    for(j = str_start[a]; j <= str_start[a + 1] - 1; j++) {
        append_to_name(str_pool[j]);
    }
    for(j = str_start[n]; j <= str_start[n + 1] - 1; j++) {
        append_to_name(str_pool[j]);
    }
    for(j = str_start[e]; j <= str_start[e + 1] - 1; j++) {
        append_to_name(str_pool[j]);
    }
    if (k <= FILE_NAME_SIZE) {
        name_length = k;
    }
    else {
        name_length = FILE_NAME_SIZE;
    }
    append_to_name_nul;
}

Section 520

A messier routine is also needed, since format file names must be scanned before ’s string mechanism has been initialized. We shall use the global variable tex_format_default to supply the text for default system areas and extensions related to format files.

NOTE

String ".fmt" must be added in the pool.

constants.h
#define FORMAT_DEFAULT_LENGTH 20      // length of the |tex_format_default| string
#define FORMAT_AREA_LENGTH    11      // length of its area part
#define FORMAT_EXT_LENGTH     4       // length of its '.fmt' part
#define FORMAT_EXTENSION      FMT_EXT // the extension, as a WEB constant

⟨ Global variables 13 ⟩+≡

char tex_format_default[FORMAT_DEFAULT_LENGTH];

⟨ Read the other strings 51 ⟩+≡

put_string(".fmt"); // FMT_EXT: 260

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define FMT_EXT 260

Section 521

⟨ Set initial values of key variables 21 ⟩+≡

memcpy(tex_format_default, "TeXformats/plain.fmt", 20);

Section 522

⟨ Check the “constant” values for consistency 14 ⟩+≡

if (FORMAT_DEFAULT_LENGTH > FILE_NAME_SIZE) {
    bad = 31;
}

Section 523

Here is the messy routine that was just mentioned. It sets name_of_file from the first n characters of tex_format_default, followed by buffer[a .. b], followed by the last FORMAT_EXT_LENGTH characters of tex_format_default.

We dare not give error messages here, since calls this routine before the error routine is ready to roll. Instead, we simply drop excess characters, since the error will be detected in another way when a strange file name isn’t found.

NOTE

Null byte is added at the end, and tex_format_default is indexed from 0.

dumping.c
// << Start file |dumping.c|, 1382 >>

void pack_buffered_name(small_number n, int a, int b) {
    int k; // number of positions filled in |name_of_file|
    ASCII_code c; // character being packed
    int j; // index into |buffer| or |tex_format_default|
    if (n + b - a + 1 + FORMAT_EXT_LENGTH > FILE_NAME_SIZE) {
        b = a + FILE_NAME_SIZE - n - 1 - FORMAT_EXT_LENGTH;
    }
    k = 0;
    for(j = 0; j < n; j++) {
        append_to_name(XORD[(int)tex_format_default[j]]);
    }
    for(j = a; j <= b; j++) {
        append_to_name(buffer[j]);
    }
    for(j = FORMAT_DEFAULT_LENGTH - FORMAT_EXT_LENGTH; j < FORMAT_DEFAULT_LENGTH; j++) {
        append_to_name(XORD[(int)tex_format_default[j]]);
    }
    if (k <= FILE_NAME_SIZE) {
        name_length = k;
    }
    else {
        name_length = FILE_NAME_SIZE;
    }
    append_to_name_nul;
}

Section 524

Here is the only place we use pack_buffered_name. This part of the program becomes active when a “virgin” is trying to get going, just after the preliminary initialization, or when the user is substituting another format file by typing ‘&’ after the initial ‘**’ prompt. The buffer contains the first line of input in buffer[loc .. (last − 1)], where loc last and buffer[loc] ‘␣’.

dumping.c
bool open_fmt_file() {
    int j; // the first space after the format file name
    j = loc;
    if (buffer[loc] == '&') {
        incr(loc);
        j = loc;
        buffer[last] = ' ';
        while (buffer[j] != ' ') {
            incr(j);
        }
        pack_buffered_name(0, loc, j - 1); // try first without the system file area
        if (w_open_in(&fmt_file)) {
            goto found;
        }
        pack_buffered_name(FORMAT_AREA_LENGTH, loc, j - 1); // now try the system format file area
        if (w_open_in(&fmt_file)) {
            goto found;
        }
        wterm_ln("Sorry, I can't find that format; will try PLAIN.");
        update_terminal;
    }
    // now pull out all the stops: try for the system plain file
    pack_buffered_name(FORMAT_DEFAULT_LENGTH - FORMAT_EXT_LENGTH, 1, 0);
    if (!w_open_in(&fmt_file)) {
        wterm_ln("I can't find the PLAIN format file!");
        return false;
    }
found:
    loc = j;
    return true;
}

Section 525

Operating systems often make it possible to determine the exact name (and possible version number) of a file that has been opened. The following routine, which simply makes a string from the value of name_of_file, should ideally be changed to deduce the full name of file f, which is the file most recently opened, if it is possible to do this in a Pascal program.

This routine might be called after string memory has overflowed, hence we dare not use ‘str_room’.

NOTE

Indexing into name_of_file starts from 0 instead of 1.

Functions a_make_name_string, b_make_name_string, and w_make_name_string have been removed.

filenames.c
str_number make_name_string() {
    int k; // index into |name_of_file|
    if (pool_ptr + name_length > POOL_SIZE
        || str_ptr == MAX_STRINGS
        || cur_length > 0)
    {
        return '?';
    }
    else {
        for(k = 0; k < name_length; k++) {
            append_char(XORD[(int)name_of_file[k]]);
        }
        return make_string();
    }
}

Section 526

Now let’s consider the “driver” routines by which deals with file names in a system-independent manner. First comes a procedure that looks for a file name in the input by calling get_x_token for the information.

filenames.c
void scan_file_name() {
    name_in_progress = true;
    begin_name();
    // << Get the next non-blank non-call token, 406 >>
    while(true) {
        if (cur_cmd > OTHER_CHAR || cur_chr > 255) {
              // not a character
            back_input();
            break; // Goto done
        }
        if (!more_name(cur_chr)) {
            break; // Goto done
        }
        get_x_token();
    }
    // done:
    end_name();
    name_in_progress = false;
}

Section 527

The global variable name_in_progress is used to prevent recursive use of scan_file_name, since the begin_name and other procedures communicate via global variables. Recursion would arise only by devious tricks like ‘\input\input f’; such attempts at sabotage must be thwarted. Furthermore, name_in_progress prevents \input from being initiated when a font size specification is being scanned.

Another global variable, job_name, contains the file name that was first \input by the user. This name is extended by ‘.log’ and ‘.dvi’ and ‘.fmt’ in the names of ’s output files.

⟨ Global variables 13 ⟩+≡

bool name_in_progress; // is a file name being scanned?
str_number job_name;   // principal file name
bool log_opened;       // has the transcript file been opened?

Section 528

Initially job_name = 0; it becomes nonzero as soon as the true name is known. We have job_name = 0 if and only if the ‘.log’ file has not been opened, except of course for a short time just after job_name has become nonzero.

⟨ Initialize the output routines 55 ⟩+≡

job_name = 0;
name_in_progress = false;
log_opened = false;

Section 529

Here is a routine that manufactures the output file names, assuming that job_name 0. It ignores and changes the current settings of cur_area and cur_ext.

parser.h
#define pack_cur_name pack_file_name(cur_name, cur_area, cur_ext)
filenames.c
void pack_job_name(str_number s) {
    // |s = ".log"|, |".dvi"|, or |FORMAT_EXTENSION|
    cur_area = EMPTY_STRING;
    cur_ext = s;
    cur_name = job_name;
    pack_cur_name;
}

Section 530

If some trouble arises when tries to open a file, the following routine calls upon the user to supply another file name. Parameter s is used in the error message to identify the type of file; parameter e is the default extension if none is given. Upon exit from the routine, variables cur_name, cur_area, cur_ext, and name_of_file are ready for another attempt at file opening.

NOTE

String ".tex" is added in the pool.

⟨ Read the other strings 51 ⟩+≡

put_string(".tex"); // TEX_EXT: 261

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define TEX_EXT 261
terminal.c
void prompt_file_name(char *s, str_number e) {
    int k; // index into |buffer|
    if (strcmp(s, "input file name") == 0) {
        print_err("I can't find file `");
    }
    else {
        print_err("I can't write on file `");
    }
    print_file_name(cur_name, cur_area, cur_ext);
    print("'.");
    if (e == TEX_EXT) {
        show_context();
    }
    print_nl("Please type another ");
    print(s);
    if (interaction < SCROLL_MODE) {
        fatal_error("*** (job aborted, file error in nonstop mode)");
    }
    clear_terminal;
    prompt_input(": ");
    // << Scan file name in the buffer, 531 >>
    if (cur_ext == EMPTY_STRING) {
        cur_ext = e;
    }
    pack_cur_name;
}

Section 531

⟨ Scan file name in the buffer 531 ⟩≡

begin_name();
k = first;
while (buffer[k] == ' ' && k < last) {
    incr(k);
}
while(true) {
    if (k == last || !more_name(buffer[k])) {
        break; // Goto done
    }
    incr(k);
}
// done:
end_name();

Section 532

Here’s an example of how these conventions are used. Whenever it is time to ship out a box of stuff, we shall use the macro ensure_dvi_open.

NOTE

String ".dvi" is added in the pool.

⟨ Read the other strings 51 ⟩+≡

put_string(".dvi"); // DVI_EXT: 262

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define DVI_EXT 262
dvi.h
// << Start file |dvi.h|, 1381 >>

#define ensure_dvi_open                                            \
    do {                                                           \
        if (output_file_name == 0) {                               \
            if (job_name == 0) {                                   \
                open_log_file();                                   \
            }                                                      \
            pack_job_name(DVI_EXT);                                \
            while (!b_open_out(&dvi_file)) {                        \
                prompt_file_name("file name for output", DVI_EXT); \
            }                                                      \
            output_file_name = make_name_string();                 \
        }                                                          \
    } while (0)

⟨ Global variables 13 ⟩+≡

byte_file dvi_file;          // the device-independent output goes here
str_number output_file_name; // full name of the output file
str_number log_name;         // full name of the log file

Section 533

⟨ Initialize the output routines 55 ⟩+≡

output_file_name = 0;

Section 534

The open_log_file routine is used to open the transcript file and to help it catch up to what has previously been printed on the terminal.

NOTE

Strings "texput" and ".log" are added in the pool.

⟨ Read the other strings 51 ⟩+≡

put_string("texput"); // TEXPUT_STRING: 263
put_string(".log");   // LOG_EXT: 264

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define TEXPUT_STRING 263
#define LOG_EXT       264
files.c
void open_log_file() {
    int old_setting; // previous |selector| setting
    int k;           // index into |months| and |buffer|
    int l;           // end of first input line
    char *months[12] = {
        "JAN", "FEB", "MAR", "APR", "MAY", "JUN",
        "JUL", "AUG", "SEP", "OCT", "NOV", "DEC"
    }; // abbreviations of month names
    old_setting = selector;
    if (job_name == 0) {
        job_name = TEXPUT_STRING; // "texput"
    }
    pack_job_name(LOG_EXT);
    while (!a_open_out(&log_file)) {
        // << Try to get a different log file name, 535 >>
    }
    log_name = make_name_string();
    selector = LOG_ONLY;
    log_opened = true;
    // << Print the banner line, including the date and time, 536 >>
    input_stack[input_ptr] = cur_input; // make sure bottom level is in memory
    print_nl("**");
    l = input_stack[0].limit_field; // last position of first line
    if (buffer[l] == end_line_char) {
        decr(l);
    }
    for(k = 1; k <= l; k++) {
        print_strnumber(buffer[k]);
    }
    print_ln(); // now the transcript file contains the first line of input
    selector = old_setting + 2; // |LOG_ONLY| or |TERM_AND_LOG|
}

Section 535

Sometimes open_log_file is called at awkward moments when is unable to print error messages or even to show_context. The prompt_file_name routine can result in a fatal_error, but the error routine will not be invoked because log_opened will be false.

The normal idea of BATCH_MODE is that nothing at all should be written on the terminal. However, in the unusual case that no log file could be opened, we make an exception and allow an explanatory message to be seen.

Incidentally, the program always refers to the log file as a ‘transcript file’, because some systems cannot use the extension ‘.log’ for this file.

⟨ Try to get a different log file name 535 ⟩≡

selector = TERM_ONLY;
prompt_file_name("transcript file name", LOG_EXT);

Section 536

NOTE

Months name are stored in a char * (see above) instead of being stored in the string pool.

⟨ Print the banner line, including the date and time 536 ⟩≡

wlog(BANNER);
slow_print(format_ident);
print("  ");
print_int(sys_day);
print_char(' ');
wlog("%s", months[sys_month - 1]);
print_char(' ');
print_int(sys_year);
print_char(' ');
print_two(sys_time / 60);
print_char(':');
print_two(sys_time % 60);

Section 537

Let’s turn now to the procedure that is used to initiate file reading when an ‘\input’ command is being processed. Beware: For historic reasons, this code foolishly conserves a tiny bit of string pool space; but that can confuse the interactive ‘E’ option.

NOTE

Recall the ‘E’ option is not supported in this implementation.

filenames.c
// TeX will \input something
void start_input() {
    scan_file_name(); // set |cur_name| to desired file name
    if (cur_ext == EMPTY_STRING) {
        cur_ext = TEX_EXT;
    }
    pack_cur_name;
    while(true) {
        begin_file_reading(); // set up |cur_file| and new level of input
        if (a_open_in(&cur_file)) {
            break; // Goto done
        }
        if (cur_area == EMPTY_STRING) {
            pack_file_name(cur_name, TEX_AREA, cur_ext);
            if (a_open_in(&cur_file)) {
                break; // Goto done
            }
        }
        end_file_reading(); // remove the level that didn't work
        prompt_file_name("input file name", TEX_EXT);
    }
    // done:
    name = make_name_string();
    if (job_name == 0) {
        job_name = cur_name;
        open_log_file();
    } // |open_log_file| doesn't |show_context|, so |limit| and |loc| needn't be set to meaningful values yet
    if (term_offset + length(name) > MAX_PRINT_LINE - 2) {
        print_ln();
    }
    else if (term_offset > 0 || file_offset > 0) {
        print_char(' ');
    }
    print_char('(');
    incr(open_parens);
    slow_print(name);
    update_terminal;
    state = NEW_LINE;
    if (name == str_ptr - 1) {
        // conserve string pool space (but see note above)
        flush_string;
        name = cur_name;
    }
    // << Read the first line of the new file, 538 >>
}

Section 538

Here we have to remember to tell the input_ln routine not to start with a get. If the file is empty, it is considered to contain a single blank line.

NOTE

The boolean argument of input_ln has been removed (see section 31).

⟨ Read the first line of the new file 538 ⟩≡

line = 1;
input_ln(cur_file); // ignore returned value
firm_up_the_line();
if (end_line_char_inactive) {
    decr(limit);
}
else {
    buffer[limit] = end_line_char;
}
first = limit + 1;
loc = start;