Section 25: Input and output

The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of “real” programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let’s get it over with.

The basic operations we need to do are (1) inputting and outputting of text, to or from a file or the user’s terminal; (2) inputting and outputting of eigh bit bytes, to or from a file; (3) instructing the operating system to initiate (“open”) or to terminate (“close”) input or output from a specified file; (4) testing whether the end of an input file has been reached.

$T E X$ needs to deal with two kinds of files. We shall use the term alpha_file for a file that contains textual data, and the term byte_file for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users.

The program actually makes use also of a third kind of file, called a word_file, when dumping and reloading base information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined.

⟨ Types in the outer block 18 ⟩+≡

typedef unsigned char eight_bits; // unsigned one-byte quantity
typedef FILE* alpha_file;         // files that contain textual data
typedef FILE* byte_file;          // files that contain binary data

Section 26

Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in Pascal, i.e., the routines called get, put, eof, and so on. But standard Pascal does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement $T E X$ ; some sort of extension to Pascal’s ordinary reset and rewrite is crucial for our purposes. We shall assume that name_of_file is a variable of an appropriate type such that the Pascal run-time system being used to implement $T E X$ can open a file whose external name is specified by name_of_file.

NOTE

Compared to the original code, name_of_file is indexed from 0.

⟨ Global variables 13 ⟩+≡

char name_of_file[FILE_NAME_SIZE + 1]; // extra byte for null byte
int name_length; // this many characters are actually relevant in |name_of_file| (the rest are blank)

Section 27

The Pascal-H compiler with which the present version of $T E X$ was prepared has extended the rules of Pascal in a very convenient way. To open file f, we can write

reset(f, name, '/O') for input;
rewrite(f, name, '/O') for output.

The ‘name’ parameter, which is of type ‘packed array[ $⟨$ any $⟩$ ] of char’, stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in ‘name’ are ignored.

The ‘/O’ parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have erstat(f) $\neq =$ 0 after an unsuccessful reset or rewrite. This allows $T E X$ to undertake appropriate corrective action.

$T E X$ ’s file-opening procedures return false if no file identified by name_of_file could be opened.

NOTE

fopen is used with the different arguments r, w, rb and wb depending the type of file manipulated.

files.c

// << Start file |files.c|, 1382 >>

// open a text file for input
int a_open_in(alpha_file *f) {
    *f = fopen(name_of_file, "r");
    return *f != NULL;
}

// open a text file for output
int a_open_out(alpha_file *f) {
    *f = fopen(name_of_file, "w");
    return *f != NULL;
}

// open a binary file for input
int b_open_in(byte_file *f) {
    *f = fopen(name_of_file, "rb");
    return *f != NULL;
}

// open a binary file for output
int b_open_out(byte_file *f) {
    *f = fopen(name_of_file, "wb");
    return *f != NULL;
}

// open a word file for input
int w_open_in(word_file *f) {
    *f = fopen(name_of_file, "rb");
    return *f != NULL;
}

// open a word file for output
int w_open_out(word_file *f) {
    *f = fopen(name_of_file, "wb");
    return *f != NULL;
}

Section 28

Files can be closed with the Pascal-H routine ‘close(f)’, which should be used when all input or output with respect to f has been completed. This makes f available to be opened again, if desired; and if f was used for output, the close operation makes the corresponding external file appear on the user’s area, ready to be read.

These procedures should not generate error messages if a file is being closed before it has been successfully opened.

NOTE

And fclose for all type of files.

files.c

// close a text file
void a_close(alpha_file f) {
    fclose(f);
}

// close a binary file
void b_close(byte_file f) {
    fclose(f);
}

// close a word file
void w_close(word_file f) {
    fclose(f);
}

Section 29

Binary input and output are done with Pascal’s ordinary get and put procedures, so we don’t have to make any other special arrangements for binary I/O. Text output is also easy to do with standard Pascal routines. The treatment of text input is more difficult, however, because of the necessary translation to ASCII_code values. $T E X$ ’s conventions should be efficient, and they should blend nicely with the user’s operating environment.

Section 30

Input from text files is read one line at a time, using a routine called input_ln. This function is defined in terms of global variables called buffer, first, and last that will be described in detail later; for now, it suffices for us to know that buffer is an array of ASCII_code values, and that first and last are indices into this array representing the beginning and ending of a line of text.

⟨ Global variables 13 ⟩+≡

ASCII_code buffer[BUF_SIZE + 1]; // lines of character being read
int first;                       // the first unused position in |buffer|
int last;                        // end of the line just input to |buffer|
int max_buf_stack;               // largest index used in |buffer|

Section 31

The input_ln function brings the next line of input from the specified file into available positions of the buffer array and returns the value true, unless the file has already been entirely read, in which case it returns false and sets last ← first. In general, the ASCII_code numbers that represent the next line of the file are input into buffer[first], buffer[first + 1], $\dots$ , buffer[last − 1]; and the global variable last is set equal to first plus the length of the line. Trailing blanks are removed from the line; thus, either last = first (in which case the line was entirely blank) or buffer[last − 1] $\neq =$ ‘␣’.

An overflow error is given, however, if the normal actions of input_ln would make last $\geq$ BUF_SIZE; this is done so that other parts of $T E X$ can safely look at the contents of buffer[last + 1] without overstepping the bounds of the buffer array. Upon entry to input_ln, the condition first < BUF_SIZE will always hold, so that there is always room for an “empty” line.

The variable max_buf_stack, which is used to keep track of how large the BUF_SIZE parameter must be to accommodate the present job, is also kept up to date by input_ln.

If the bypass_eoln parameter is true, input_ln will do a get before looking at the first character of the line; this skips over an eoln that was in f↑. The procedure does not do a get when it reaches the end of the line; therefore it can be used to acquire input from the user’s terminal as well as from ordinary text files.

Standard Pascal says that a file should have eoln immediately before eof, but $T E X$ needs only a weaker restriction: If eof occurs in the middle of a line, the system function eoln should return a true result (even though f↑ will be undefined).

Since the inner loop of input_ln is part of $T E X$ ’s “inner loop”—each character of input comes in at this place—it is wise to reduce system overhead by making use of special routines that read in an entire array of characters at once, if such routines are available. The following code uses standard Pascal to illustrate what needs to be done, but finer tuning is often possible at well-developed Pascal sites.

NOTE

The bypass_eoln boolean has been removed.

files.c

// inputs the next line or returns |false|
int input_ln(alpha_file f) {
    int last_nonblank; // |last| with trailing blanks removed
    int c;

    last = first;
    last_nonblank = first;
    while ((c = fgetc(f)) != '\n' && c != EOF) {
        if (last >= max_buf_stack) {
            max_buf_stack = last + 1;
            if (max_buf_stack == BUF_SIZE) {
                // << Report overflow of the input buffer, and abort, 35 >>
            }
        }
        buffer[last] = XORD[c];
        incr(last);
        if (buffer[last - 1] != ' ') {
            last_nonblank = last;
        }
    }
    if (c == EOF && last == first) {
        return false;
    }
    last = last_nonblank;
    return true;
}

Section 32

The user’s terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called term_in, and when it is considered an output file the file variable is term_out.

NOTE

Terminal is not manipulated with alpha_file.

Section 33

Here is how to open the terminal files in Pascal-H. The ‘/I’ switch suppresses the first get.

NOTE

See section 32.

Section 34

Sometimes it is necessary to synchronize the input/output mixture that happens on the user’s terminal, and three system-dependent procedures are used for this purpose. The first of these, update_terminal, is called when we want to make sure that everything we have output to the terminal so far has actually left the computer’s internal buffers and been sent. The second, clear_terminal, is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, wake_up_terminal, is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified in Pascal-H:

NOTE

wake_up_terminal is not used, and clear_terminal has not been considered important so it does nothing.

io.h

// << Start file |io.h|, 1381 >>

#define update_terminal fflush(stdout)
#define clear_terminal do_nothing

Section 35

We need a special routine to read the first line of $T E X$ input from the user’s terminal. This line is different because it is read before we have opened the transcript file; there is sort of a “chicken and egg” problem here. If the user types ‘\input paper’ on the first line, or if some macro invoked by that line does such an \input, the transcript file will be named ‘paper.log’; but if no \input commands are performed during the first line of terminal input, the transcript file will acquire its default name ‘texput.log’. (The transcript file will not contain error messages generated by the first line before the first \input command.)

The first line is even more special if we are lucky enough to have an operating system that treats $T E X$ differently from a run-of-the-mill Pascal object program. It’s nice to let the user start running a $T E X$ job by typing a command line like ‘tex paper’; in such a case, $T E X$ will operate as if the first line of input were ‘paper’, i.e., the first line will consist of the remainder of the command line, after the part that invoked $T E X$ .

The first line is special also because it may be read before $T E X$ has input a format file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. (If the Pascal compiler does not support non-local goto, the statement ‘goto final_end’ should be replaced by something that quietly terminates the program.)

⟨ Report overflow of the input buffer, and abort 35 ⟩≡

if (format_ident == 0) {
    printf("Buffer size exceeded!\n");
    exit(0); // Goto final_end
}
else {
    cur_input.loc_field = first;
    cur_input.limit_field = last - 1;
    overflow("buffer size", BUF_SIZE);
}

Section 36

Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications:

It should open file term_in for input from the terminal. (The file term_out will already be open for output to the terminal.)
If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with ‘**’, and the first line of input should be whatever is typed in response.
The first line of input, which might or might not be a command line, should appear in locations first to last − 1 of the buffer array.
The global variable loc should be set so that the character to be read next by $T E X$ is in buffer[loc]. This character should not be blank, and we should have loc $<$ last.

(It may be necessary to prompt the user several times before a non-blank line comes in. The prompt is ‘**’ instead of the later ‘*’ because the meaning is slightly different: ‘\input’ need not be typed immediately after ‘**’.)

io.h

#define loc cur_input.loc_field // location of first unread character in buffer

Section 37

The following program does the required initialization without retrieving a possible command line. It should be clear how to modify this routine to deal with command lines, if the system permits them.

NOTE

This procedure is adapted so the arguments of the command line are copied into buffer one by one (a single space is added between arguments). This copy is based on input_ln, and the code that follows issue the original loop of init_terminal: Then, buffer[loc] will point to the first nonblank character of the commande line (in case the user typed tex " paper.tex" for example), and buffer[last − 1] to the last nonblank character. In case the input filename has not been provided, then we fall back into the original code using the input_ln function with stdin as input (with interverted code to avoid a call to input_ln if the command line was not empty).

init_cleanup.c

// gets the terminal input started
bool init_terminal(int argc, char *argv[]) {
    int last_nonblank; // |last| with trailing blanks removed
    int i; // index into |argv|
    int j; // indox into an argument of |argv|
    int n; // length of an argument

    // The code below does the same thing as |input_ln|
    last = first;
    last_nonblank = first;
    for(i = 1; i < argc; i++) {
        n = strlen(argv[i]);
        if (last >= max_buf_stack) {
            max_buf_stack = last + n + 1; // a single space is added between arguments
            if (max_buf_stack >= BUF_SIZE) {
                // << Report overflow of the input buffer, and abort, 35 >>
            }
        }
        for(j = 0; j < n; j++) {
            buffer[last] = XORD[(int)argv[i][j]];
            incr(last);
            if (buffer[last - 1] != ' ') {
                last_nonblank = last;
            }
        }
        buffer[last] = ' ';
        incr(last);
    }
    last = last_nonblank;
    
    // the (adapted) original loop
    while (true) {
        loc = first;
        while (loc < last && buffer[loc] == ' ') {
            incr(loc);
        }
        if (loc < last) {
            return true;
        }
        printf("Please type the name of your input file.\n");
        printf("**");
        update_terminal;
        if (!input_ln(stdin)) {
            // this should'nt happen
            printf("\n!End of file on the terminal... why?");
            return false;
        }
    }
}

TeX in C