Section 539: Font metric data

TeX gets its knowledge about fonts from font metric files, also called TFM files; the ‘T’ in ‘TFM’ stands for $T E X$ , but other programs know about them too.

The information in a TFM file appears in a sequence of 8-bit bytes. Since the number of bytes is always a multiple of 4, we could also regard the file as a sequence of 32-bit words, but $T E X$ uses the byte interpretation. The format of TFM files was designed by Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds of information in a compact but useful form.

⟨ Global variables 13 ⟩+≡

byte_file tfm_file;

Section 540

The first 24 bytes (6 words) of a TFM file contain twelve 16-bit integers that give the lengths of the various subsequent portions of the file. These twelve integers are, in order:

lf
lh
bc
ec
nw
nh
nd
ni
nl
nk
ne
np

= length of the entire file, in words;
= length of the header data, in words;
= smallest character code in the font;
= largest character code in the font;
= number of words in the width table;
= number of words in the height table;
= number of words in the depth table;
= number of words in the italic correction table;
= number of words in the lig/kern table;
= number of words in the kern table;
= number of words in the extensible character table;
= number of font parameter words.

They are all nonnegative and less than $2^{15}$ . We must have bc − 1 $\leq$ ec $\leq$ 255, and

lf = 6 + lh + (ec − bc + 1) + nw + nh + nd + ni + nl + nk + ne + np.

Note that a font may contain as many as 256 characters (if bc = 0 and ec = 255), and as few as 0 characters (if bc = ec + 1).

Incidentally, when two or more 8-bit bytes are combined to form an integer of 16 or more bits, the most significant bytes appear first in the file. This is called BigEndian order.

Section 541

The rest of the TFM file may be regarded as a sequence of ten data arrays having the informal specification

header:
char:
width:
height:
depth:
italic:
lig:
kern:
exten:
param:

array [0 .. lh − 1] of stuff
array [bc .. ec] of char_info_word
array [0 .. nw − 1] of fix_word
array [0 .. nh − 1] of fix_word
array [0 .. nd − 1] of fix_word
array [0 .. ni − 1] of fix_word
array [0 .. nl − 1] of lig_kern_command
array [0 .. nk − 1] of fix_word
array [0 .. ne − 1] of extensible_recipe
array [1 .. np] of fix_word

The most important data type used here is a fix_word, which is a 32-bit representation of a binary fraction. A fix_word is a signed quantity, with the two’s complement of the entire word used to represent negation. Of the 32 bits in a fix_word, exactly 12 are to the left of the binary point; thus, the largest fix_word value is $2048 - 2^{- 20}$ , and the smallest is $- 2048$ . We will see below, however, that all but two of the fix_word values must lie between −16 and +16.

Section 542

The first data array is a block of header information, which contains general facts about the font. The header must contain at least two words, header[0] and header[1], whose meaning is explained below. Additional header information of use to other software routines might also be included, but $T E X 82$ does not need to know about such details. For example, 16 more words of header information are in use at the Xerox Palo Alto Research Center; the first ten specify the character coding scheme used (e.g., ‘XEROX text’ or ‘TeX math symbols’), the next five give the font identifier (e.g., ‘HELVETICA’ or ‘CMSY’), and the last gives the “face byte”. The program that converts DVI files to Xerox printing format gets this information by looking at the TFM file, which it needs to read anyway because of other information that is not explicitly repeated in DVI format.

header[0] is a 32-bit check sum that $T E X$ will copy into the DVI output file. Later on when the DVI file is printed, possibly on another computer, the actual font that gets used is supposed to have a check sum that agrees with the one in the TFM file used by $T E X$ . In this way, users will be warned about potential incompatibilities. (However, if the check sum is zero in either the font file or the TFM file, no check is made.) The actual relation between this check sum and the rest of the TFM file is not important; the check sum is simply an identification number with the property that incompatible fonts almost always have distinct check sums.
header[1] is a fix_word containing the design size of the font, in units of $T E X$ points. This number must be at least 1.0; it is fairly arbitrary, but usually the design size is 10.0 for a “10 point” font, i.e., a font that was designed to look best at a 10-point size, whatever that really means. When a $T E X$ user asks for a font ‘at $δ$ pt’, the effect is to override the design size and replace it by $δ$ , and to multiply the x and y coordinates of the points in the font image by a factor of $δ$ divided by the design size. All other dimensions in the TFM file are fix_word numbers in design-size units, with the exception of param[1] (which denotes the slant ratio). Thus, for example, the value of param[6], which defines the em unit, is often the fix_word value $2^{20} = 1.0$ , since many fonts have a design size equal to one em. The other dimensions must be less than 16 design-size units in absolute value; thus, header[1] and param[1] are the only fix_word entries in the whole TFM file whose first byte might be something besides 0 or 255.

Section 543

Next comes the char_info array, which contains one char_info_word per character. Each word in this part of the file contains six fields packed into four bytes as follows.

first byte: width_index (8 bits)
second byte: height_index (4 bits) times 16, plus depth_index (4 bits)
third byte: italic_index (6 bits) times 4, plus tag (2 bits)
fourth byte: remainder (8 bits)

The actual width of a character is width[width_index], in design-size units; this is a device for compressing information, since many characters have the same width. Since it is quite common for many characters to have the same height, depth, or italic correction, the TFM format imposes a limit of 16 different heights, 16 different depths, and 64 different italic corrections.

The italic correction of a character has two different uses. (a) In ordinary text, the italic correction is added to the width only if the $T E X$ user specifies ‘\/’ after the character. (b) In math formulas, the italic correction is always added to the width, except with respect to the positioning of subscripts.

Incidentally, the relation width[0] = height[0] = depth[0] = italic[0] = 0 should always hold, so that an index of zero implies a value of zero. The width_index should never be zero unless the character does not exist in the font, since a character is valid if and only if it lies between bc and ec and has a nonzero width_index.

Section 544

The tag field in a char_info_word has four values that explain how to interpret the remainder field.

tag = 0 (NO_TAG) means that remainder is unused.
tag = 1 (LIG_TAG) means that this character has a ligature/kerning program starting at position remainder in the lig_kern array.
tag = 2 (LIST_TAG) means that this character is part of a chain of characters of ascending sizes, and not the largest in the chain. The remainder field gives the character code of the next larger character.\par
tag = 3 (EXT_TAG) means that this character code represents an extensible character, i.e., a character that is built up of smaller pieces so that it can be made arbitrarily large. The pieces are specified in exten[remainder].

Characters with tag = 2 and tag = 3 are treated as characters with tag = 0 unless they are used in special circumstances in math formulas. For example, the \sum operation looks for a LIST_TAG, and the \left operation looks for both LIST_TAG and EXT_TAG.

constants.h

#define NO_TAG   0 // vanilla character
#define LIG_TAG  1 // character has a ligature/kerning program
#define LIST_TAG 2 // character has a successor in a charlist
#define EXT_TAG  3 // character is extensible

Section 545

The lig_kern array contains instructions in a simple programming language that explains what to do for special letter pairs. Each word in this array is a lig_kern_command of four bytes.

first byte: skip_byte, indicates that this is the final program step if the byte is 128 or more, otherwise the next step is obtained by skipping this number of intervening steps.
second byte: next_char, “if next_char follows the current character, then perform the operation and stop, otherwise continue.
third byte: op_byte, indicates a ligature step if less than 128, a kern step otherwise.
fourth byte: remainder.

In a kern step, an additional space equal to kern[256*(op_byte − 128) + remainder] is inserted between the current character and next_char. This amount is often negative, so that the characters are brought closer together by kerning; but it might be positive.

There are eight kinds of ligature steps, having op_byte codes 4a + 2b + c where 0 $\leq$ a $\leq$ b + c and 0 $\leq$ b,c $\leq$ 1. The character whose code is remainder is inserted between the current character and next_char; then the current character is deleted if b = 0, and next_char is deleted if c = 0; then we pass over a characters to reach the next current character (which may have a ligature/kerning program of its own).

If the very first instruction of the lig_kern array has skip_byte = 255, the next_char byte is the so-called boundary character of this font; the value of next_char need not lie between bc and ec. If the very last instruction of the lig_kern array has skip_byte = 255, there is a special ligature/kerning program for a boundary character at the left, beginning at location 256*op_byte + remainder. The interpretation is that $T E X$ puts implicit boundary characters before and after each consecutive string of characters from the same font. These implicit characters do not appear in the output, but they can affect ligatures and kerning.

If the very first instruction of a character’s lig_kern program has skip_byte $>$ 128, the program actually begins in location 256*op_byte + remainder. This feature allows access to large lig_kern arrays, because the first instruction must otherwise appear in a location $\leq$ 255.

Any instruction with skip_byte $>$ 128 in the lig_kern array must satisfy the condition

256 * op_byte + remainder $<$ nl.

If such an instruction is encountered during normal program execution, it denotes an unconditional halt; no ligature or kerning command is performed.

constants.h

#define STOP_FLAG 128 // value indicating 'STOP' in a lig/kern program
#define KERN_FLAG 128 // op code for a kern step

font_metric.h

// << Start file |font_metric.h|, 1381 >>

#define skip_byte(X) qqqq_b0((X))
#define next_char(X) qqqq_b1((X))
#define op_byte(X)   qqqq_b2((X))
#define rem_byte(X)  qqqq_b3((X))

Section 546

Extensible characters are specified by an extensible_recipe, which consists of four bytes called top, mid, bot, and rep (in this order). These bytes are the character codes of individual pieces used to build up a large symbol. If top, mid, or bot are zero, they are not present in the built-up result. For example, an extensible vertical line is like an extensible bracket, except that the top and bottom pieces are missing.

Let $T$ , $M$ , $B$ , and $R$ denote the respective pieces, or an empty box if the piece isn’t present. Then the extensible characters have the form $T R^{k} M R^{k} B$ from top to bottom, for some k $\geq$ 0, unless $M$ is absent; in the latter case we can have $T R^{k} B$ for both even and odd values of k. The width of the extensible character is the width of $R$ ; and the height-plus-depth is the sum of the individual height-plus-depths of the components used, since the pieces are butted together in a vertical list.

font_metric.h

#define ext_top(X) qqqq_b0((X)) // |top| piece in a recipe
#define ext_mid(X) qqqq_b1((X)) // |mid| piece in a recipe
#define ext_bot(X) qqqq_b2((X)) // |bot| piece in a recipe
#define ext_rep(X) qqqq_b3((X)) // |rep| piece in a recipe

Section 547

The final portion of a TFM file is the param array, which is another sequence of fix_word values.

param[1] = slant is the amount of italic slant, which is used to help position accents. For example, slant = .25 means that when you go up one unit, you also go .25 units to the right. The slant is a pure number; it’s the only fix_word other than the design size itself that is not scaled by the design size.
param[2] = space is the normal spacing between words in text. Note that character ‘␣’ in the font need not have anything to do with blank spaces.
param[3] = space_stretch is the amount of glue stretching between words.
param[4] = space_shrink is the amount of glue shrinking between words.
param[5] = x_height is the size of one ex in the font; it is also the height of letters for which accents don’t have to be raised or lowered.
param[6] = quad is the size of one em in the font.
param[7] = extra_space is the amount added to param[2] at the ends of sentences.

If fewer than seven parameters are present, $T E X$ sets the missing parameters to zero. Fonts used for math symbols are required to have additional parameter information, which is explained later.

constants.h

#define SLANT_CODE         1
#define SPACE_CODE         2
#define SPACE_STRETCH_CODE 3
#define SPACE_SHRINK_CODE  4
#define X_HEIGHT_CODE      5
#define QUAD_CODE          6
#define EXTRA_SPACE_CODE   7

Section 548

So that is what TFM files hold. Since $T E X$ has to absorb such information about lots of fonts, it stores most of the data in a large array called font_info. Each item of font_info is a memory_word; the fix_word data gets converted into scaled entries, while everything else goes into words of type four_quarters.

When the user defines \font\f, say, $T E X$ assigns an internal number to the user’s font \f. Adding this number to FONT_ID_BASE gives the eqtb location of a “frozen” control sequence that will always select the font.

⟨ Types in the outer block 18 ⟩+≡

typedef int internal_font_number; // |font| in a |CHAR_NODE|
typedef int font_index;           // index into |font_info|

Section 549

Here now is the (rather formidable) array of font arrays.

constants.h

#define NON_CHAR    256 // a |halfword| code that can't match a real character
#define NON_ADDRESS 0   // a spurious |bchar_label|

⟨ Global variables 13 ⟩+≡

memory_word font_info[FONT_MEM_SIZE + 1]; // the big collection of font data
font_index fmem_ptr;                      // first unused word of |font_info|
internal_font_number font_ptr;            // largest internal font number in use
memory_word font_check[FONT_MAX + 1];     // check sum
scaled font_size[FONT_MAX + 1];           // "at" size
scaled font_dsize[FONT_MAX + 1];          // "design" size
font_index font_params[FONT_MAX + 1];     // how many font parameters are present
str_number font_name[FONT_MAX + 1];       // name of the font
str_number font_area[FONT_MAX + 1];       // area of the font
eight_bits font_bc[FONT_MAX + 1];         // beginning (smallest) character code
eight_bits font_ec[FONT_MAX + 1];         // ending (largest) character code
pointer font_glue[FONT_MAX + 1];          // glue specification for interword space, |null| if not allocated
bool font_used[FONT_MAX + 1];             // has a character from this font actually appeared in the output?
int hyphen_char[FONT_MAX + 1];            // current \hyphenchar values
int skew_char[FONT_MAX + 1];              // current \skewchar values
font_index bchar_label[FONT_MAX + 1];     // start of |lig_kern| program for left boundary character, |NON_ADDRESS| if there is none
int font_bchar[FONT_MAX + 1];        // boundary character, |NON_CHAR| if there is none
int font_false_bchar[FONT_MAX + 1];  // |font_bchar| if it doesn't exist in the font, otherwise |NON_CHAR|

Section 550

Besides the arrays just enumerated, we have directory arrays that make it easy to get at the individual entries in font_info. For example, the char_info data for character c in font f will be in font_info[CHAR_BASE[f] + c].qqqq; and if w is the width_index part of this word (the b0 field), the width of the character is font_info[width_base[f] + w].sc. (These formulas assume that MIN_QUARTERWORD has already been added to c and to w, since $T E X$ stores its quarterwords that way.)

⟨ Global variables 13 ⟩+≡

int char_base[FONT_MAX+ 1];      // base addresses for |char_info|
int width_base[FONT_MAX + 1];    // base addresses for widths
int height_base[FONT_MAX + 1];   // base addresses for heights
int depth_base[FONT_MAX + 1];    // base addresses for depths
int italic_base[FONT_MAX + 1];   // base addresses for italic corrections
int lig_kern_base[FONT_MAX + 1]; // base addresses for ligature/kerning programs
int kern_base[FONT_MAX + 1];     // base addresses for kerns
int exten_base[FONT_MAX + 1];    // base addresses for extensible recipes
int param_base[FONT_MAX + 1];    // base addresses for font parameters

Section 551

⟨ Set initial values of key variables 21 ⟩+≡

for(k = FONT_BASE; k <= FONT_MAX; k++) {
    font_used[k] = false;
}

Section 552

$T E X$ always knows at least one font, namely the null font. It has no characters, and its seven parameters are all equal to zero.

NOTE

String "nullfont" is added to the pool.

⟨ Read the other strings 51 ⟩+≡

put_string("nullfont"); // NULLFONT_STRING: 265

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define NULLFONT_STRING 265

⟨ Initialize table entries (done by INITEX only) 164 ⟩+≡

font_ptr = NULL_FONT;
fmem_ptr = 7;
font_name[NULL_FONT] = NULLFONT_STRING;
font_area[NULL_FONT] = EMPTY_STRING;
hyphen_char[NULL_FONT] = '-';
skew_char[NULL_FONT] = -1;
bchar_label[NULL_FONT] = NON_ADDRESS;
font_bchar[NULL_FONT] = NON_CHAR;
font_false_bchar[NULL_FONT] = NON_CHAR;
font_bc[NULL_FONT] = 1;
font_ec[NULL_FONT] = 0;
font_size[NULL_FONT] = 0;
font_dsize[NULL_FONT] = 0;
char_base[NULL_FONT] = 0;
width_base[NULL_FONT] = 0;
height_base[NULL_FONT] = 0;
depth_base[NULL_FONT] = 0;
italic_base[NULL_FONT] = 0;
lig_kern_base[NULL_FONT] = 0;
kern_base[NULL_FONT] = 0;
exten_base[NULL_FONT] = 0;
font_glue[NULL_FONT] = null;
font_params[NULL_FONT] = 7;
param_base[NULL_FONT] = -1;
for(k = 0; k <= 6; k++) {
    font_info[k].sc = 0;
}

Section 553

NOTE

String “nullfont” is added a second time to the pool. Too bad.

⟨ Put each of TeX’s primitives into the hash table 226 ⟩+≡

primitive("nullfont", SET_FONT, NULL_FONT);
text(FROZEN_NULL_FONT) = str_ptr - 1;
eqtb[FROZEN_NULL_FONT] = eqtb[cur_val];

Section 554

Of course we want to define macros that suppress the detail of how font information is actually packed, so that we don’t have to write things like

font_info[width_base[f] + font_info[char_base[f] + c].qqqq.b0].sc

too often. The WEB definitions here make char_info(f)(c) the four_quarters word of font information corresponding to character c of font f. If q is such a word, char_width(f)(q) will be the character’s width; hence the long formula above is at least abbreviated to

char_width(f)(char_info(f)(c)).

Usually, of course, we will fetch q first and look at several of its fields at the same time.

The italic correction of a character will be denoted by char_italic(f)(q), so it is analogous to char_width. But we will get at the height and depth in a slightly different way, since we usually want to compute both height and depth if we want either one. The value of height_depth(q) will be the 8-bit quantity

b = height_index $\times$ 16 + depth_index,

and if b is such a byte we will write char_height(f)(b) and char_depth(f)(b) for the height and depth of the character c for which q = char_info(f)(c). Got that?

The tag field will be called char_tag(q); the remainder byte will be called rem_byte(q), using a macro that we have already defined above.

Access to a character’s width, height, depth, and tag fields is part of $T E X$ ’s inner loop, so we want these macros to produce code that is as fast as possible under the circumstances.

NOTE

char_height, etc., are defined with a macro that takes several arguments, so char_height(f)(b) will be used as char_height(f, b).

font_metric.h

#define char_info(X,Y)   font_info[char_base[(X)] + (Y)]
#define char_width(X,Y)  font_info[width_base[(X)] + qqqq_b0((Y))].sc
#define char_exists(X)   (qqqq_b0((X)) > MIN_QUARTERWORD)
#define char_italic(X,Y) font_info[italic_base[(X)] + qqqq_b2((Y))/4].sc
#define height_depth(X)  qqqq_b1((X))
#define char_height(X,Y) font_info[height_base[(X)] + (Y) / 16].sc
#define char_depth(X,Y)  font_info[depth_base[(X)] + (Y) % 16].sc
#define char_tag(X)      (qqqq_b2((X)) % 4)

Section 555

The global variable null_character is set up to be a word of char_info for a character that doesn’t exist. Such a word provides a convenient way to deal with erroneous situations.

⟨ Global variables 13 ⟩+≡

memory_word null_character; // nonexistent character information

Section 556

⟨ Set initial values of key variables 21 ⟩+≡

qqqq_b0(null_character) = MIN_QUARTERWORD;
qqqq_b1(null_character) = MIN_QUARTERWORD;
qqqq_b2(null_character) = MIN_QUARTERWORD;
qqqq_b3(null_character) = MIN_QUARTERWORD;

Section 557

Here are some macros that help process ligatures and kerns. We write char_kern(f)(j) to find the amount of kerning specified by kerning command j in font f. If j is the char_info for a character with a ligature/kern program, the first instruction of that program is either i = font_info[lig_kern_start(f)(j)] or font_info[lig_kern_restart(f)(i)], depending on whether or not skip_byte(i) $\leq$ STOP_FLAG.

The constant KERN_BASE_OFFSET should be simplified, for Pascal compilers that do not do local optimization.

constants.h

#define KERN_BASE_OFFSET 256*(128 + MIN_QUARTERWORD)

font_metric.h

#define char_kern(X,Y)        (font_info[kern_base[X] + 256*op_byte((Y)) + rem_byte((Y))].sc)
// beginning of lig/kern program:
#define lig_kern_start(X,Y)   (lig_kern_base[(X)] + rem_byte((Y))) 
#define lig_kern_restart(X,Y) (lig_kern_base[(X)] + 256*op_byte((Y)) + rem_byte((Y)) + 32768 - KERN_BASE_OFFSET)

Section 558

Font parameters are referred to as slant(f), space(f), etc.

font_metric.h

#define param(X,Y)       (font_info[(X) + param_base[(Y)]].sc)
#define slant(X)         param(SLANT_CODE,(X))         // slant to the right, per unit distance upward
#define space(X)         param(SPACE_CODE,(X))         // normal space between words
#define space_stretch(X) param(SPACE_STRETCH_CODE,(X)) // stretch between words
#define space_shrink(X)  param(SPACE_SHRINK_CODE,(X))  // shrink between words
#define x_height(X)      param(X_HEIGHT_CODE,(X))      // one ex
#define quad(X)          param(QUAD_CODE,(X))          // one em
#define extra_space(X)   param(EXTRA_SPACE_CODE,(X))   // additional space at end of sentence

NOTE

The destination variable has been added, so the code corresponds to a full line of the final code.

⟨ The em width for cur_font 558 ⟩≡

v = quad(cur_font);

Section 559

NOTE

The destination variable has been added, so the code corresponds to a full line of the final code.

⟨ The x-height for cur_font 559 ⟩≡

v = x_height(cur_font);

Section 560

$T E X$ checks the information of a TFM file for validity as the file is being read in, so that no further checks will be needed when typesetting is going on. The somewhat tedious subroutine that does this is called read_font_info. It has four parameters: the user font identifier u, the file name and area strings nom and aire, and the “at” size s. If s is negative, it’s the negative of a scale factor to be applied to the design size; s = −1000 is the normal case. Otherwise s will be substituted for the design size; in this case, s must be positive and less than 2048 pt (i.e., it must be less than $2^{27}$ when considered as an integer).

The subroutine opens and closes a global file variable called tfm_file. It returns the value of the internal font number that was just loaded. If an error is detected, an error message is issued and no font information is stored; NULL_FONT is returned in this case.

font_metric.h

#define abort goto bad_tfm // do this when the `TFM` data is wrong

font_metric.c

// << Start file |font_metric.c|, 1382 >>

// input a `TFM` file
internal_font_number read_font_info(pointer u, str_number nom, str_number aire, scaled s) {
    font_index k; // index into |font_info|
    bool file_opened; // was |tfm_file| successfully opened?
    halfword lf, lh, bc, ec, nw, nh, nd, ni, nl, nk, ne, np; // sizes of subfiles
    internal_font_number f; // the new font's number
    internal_font_number g; // the number to return
    eight_bits a, b, c, d; // byte variables
    memory_word qw;
    scaled sw; // accumulators
    int bch_label; // left boundary start location, or infinity
    int bchar; // boundary character, or 256
    scaled z; // the design size or the ``at'' size
    int alpha, beta; // auxiliary quantities used in fixed-point multiplication
    int temp; // temporary data for read_sixteen macro
    g = NULL_FONT;
    // << Read and check the font data; |abort| if the TFM file is malformed; if there's no room for this font, say so and |goto done|; otherwise |incr(font_ptr)| and |goto done|, 562 >>
bad_tfm:
    // << Report that the font won't be loaded, 561 >>
done:
    if (file_opened) {
        b_close(tfm_file);
    }
    return g;
}

Section 561

There are programs called TFtoPL and PLtoTF that convert between the TFM format and a symbolic property-list format that can be easily edited. These programs contain extensive diagnostic information, so $T E X$ does not have to bother giving precise details about why it rejects a particular TFM file.

font_metric.h

#define start_font_error_message                  \
    do {                                          \
        print_err("Font ");                       \
        sprint_cs(u);                             \
        print_char('=');                          \
        print_file_name(nom, aire, EMPTY_STRING); \
        if (s >= 0) {                             \
            print(" at ");                        \
            print_scaled(s);                      \
            print("pt");                          \
        }                                         \
        else if (s != -1000) {                    \
            print(" scaled ");                    \
            print_int(-s);                        \
        }                                         \
    } while (0)

⟨ Report that the font won’t be loaded 561 ⟩≡

start_font_error_message;
if (file_opened) {
    print(" not loadable: Bad metric (TFM) file");
}
else {
    print(" not loadable: Metric (TFM) file not found");
}
help5("I wasn't able to read the size data for this font,")
    ("so I will ignore the font specification.")
    ("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")
    ("You might try inserting a different font spec;")
    ("e.g., type `I\\font<same font id>=<substitute font name>'.");
error();

Section 562

⟨ Read and check the font data; abort if the TFM file is malformed; if there’s no room for this font, say so and goto done; otherwise incr(font_ptr) and goto done 562 ⟩≡

// << Open |tfm_file| for input, 563 >>
// << Read the TFM size fields, 565 >>
// << Use size fields to allocate font information, 566 >>
// << Read the TFM header, 568 >>
// << Read character data, 569 >>
// << Read box dimensions, 571 >>
// << Read ligature/kern program, 573 >>
// << Read extensible character recipes, 574 >>
// << Read font parameters, 575 >>
// << Make final adjustments and |goto done|, 576 >>

Section 563

NOTE

String ".tfm" is added to the pool.

⟨ Read the other strings 51 ⟩+≡

put_string(".tfm"); // TFM_EXT: 266

⟨ Internal strings numbers in the pool 51 ⟩+≡

#define TFM_EXT 266

⟨ Open tfm_file for input 563 ⟩≡

file_opened = false;
if (aire == EMPTY_STRING) {
    pack_file_name(nom, TEX_FONT_AREA, TFM_EXT);
}
else {
    pack_file_name(nom, aire, TFM_EXT);
}
if (!b_open_in(&tfm_file)) {
    abort;
}
file_opened = true;

Section 564

Note: A malformed TFM file might be shorter than it claims to be; thus eof(tfm_file) might be true when read_font_info refers to tfm_file↑ or when it says get(tfm_file). If such circumstances cause system error messages, you will have to defeat them somehow, for example by defining fget to be ‘begin get(tfm_file); if (eof(tfm_file)) then abort; end’.

NOTE

fget is implemented with the fgetc function. An error occurs if the end of file is reached.

font_metric.h

#define fget(X)                \
    do {                       \
        (X) = fgetc(tfm_file); \
        if (feof(tfm_file)) {  \
            abort;             \
        }                      \
    } while (0)

#define read_sixteen(X) \
    fget(temp);         \
    if (temp > 127) {   \
        abort;          \
    }                   \
    fget((X));          \
    (X) += temp*256

#define store_four_quarters(X) \
    fget(a);                   \
    qqqq_b0(qw) = a;           \
    fget(b);                   \
    qqqq_b1(qw) = b;           \
    fget(c);                   \
    qqqq_b2(qw) = c;           \
    fget(d);                   \
    qqqq_b3(qw) = d;           \
    (X) = qw

Section 565

⟨ Read the TFM size fields 565 ⟩≡

read_sixteen(lf);
read_sixteen(lh);
read_sixteen(bc);
read_sixteen(ec);
if (bc > ec + 1 || ec > 255) {
    abort;
}
if (bc > 255) {
    // |bc == 256| and |ec == 255|
    bc = 1;
    ec = 0;
}
read_sixteen(nw);
read_sixteen(nh);
read_sixteen(nd);
read_sixteen(ni);
read_sixteen(nl);
read_sixteen(nk);
read_sixteen(ne);
read_sixteen(np);
if (lf != 6 + lh + (ec - bc + 1) + nw + nh + nd + ni + nl + nk + ne + np) {
    abort;
}
if ( nw == 0 || nh == 0 || nd == 0 || ni == 0) {
    abort;
}

Section 566

The preliminary settings of the index-offset variables char_base, width_base, lig_kern_base, kern_base, and exten_base will be corrected later by subtracting MIN_QUARTERWORD from them; and we will subtract 1 from param_base too. It’s best to forget about such anomalies until later.

⟨ Use size fields to allocate font information 566 ⟩≡

lf = lf - 6 - lh; // |lf| words should be loaded into |font_info|
if (np < 7) {
    lf += 7 - np; // at least seven parameters will appear
}
if (font_ptr == FONT_MAX || fmem_ptr + lf > FONT_MEM_SIZE) {
    // << Apologize for not loading the font, |goto done|, 567 >>
}
f = font_ptr + 1;
char_base[f] = fmem_ptr - bc;
width_base[f] = char_base[f] + ec + 1;
height_base[f] = width_base[f] + nw;
depth_base[f] = height_base[f] + nh;
italic_base[f] = depth_base[f] + nd;
lig_kern_base[f] = italic_base[f] + ni;
kern_base[f] = lig_kern_base[f] + nl - KERN_BASE_OFFSET;
exten_base[f] = kern_base[f] + KERN_BASE_OFFSET + nk;
param_base[f] = exten_base[f] + ne;

Section 567

⟨ Apologize for not loading the font, goto done 567 ⟩≡

start_font_error_message;
print(" not loaded: Not enough room left");
help4("I'm afraid I won't be able to make use of this font,")
    ("because my memory for character-size data is too small.")
    ("If you're really stuck, ask a wizard to enlarge me.")
    ("Or maybe try `I\\font<same font id>=<name of loaded font>'.");
error();
goto done;

Section 568

Only the first two words of the header are needed by $T E X 82$ .

⟨ Read the TFM header 568 ⟩≡

if (lh < 2) {
    abort;
}
store_four_quarters(font_check[f]);
read_sixteen(z); // this rejects a negative design size
fget(temp);
z = z*256 + temp;
fget(temp);
z = z*16 + (temp / 16);
if (z < UNITY) {
    abort;
}
while (lh > 2) {
    fget(temp);
    fget(temp);
    fget(temp);
    fget(temp);
    decr(lh); // ignore the rest of the header
}
font_dsize[f] = z;
if (s != -1000) {
  if (s >= 0) {
    z = s;
  }
  else {
    z = xn_over_d(z, -s, 1000);
  }
}
font_size[f] = z;

Section 569

⟨ Read character data 569 ⟩≡

for(k = fmem_ptr; k <= width_base[f] - 1; k++) {
    store_four_quarters(font_info[k]);
    if ( a >= nw || b/16 >= nh || b % 16 >= nd || c/4 >= ni) {
        abort;
    }
    switch (c % 4) {
    case LIG_TAG:
        if (d >= nl) {
            abort;
        }
        break;
    
    case EXT_TAG:
        if (d >= ne) {
            abort;
        }
        break;
    
    case LIST_TAG:
        // << Check for charlist cycle, 570 >>
        break;
    
    default:
        do_nothing; // (|NO_TAG|)
    }
}

Section 570

We want to make sure that there is no cycle of characters linked together by LIST_TAG entries, since such a cycle would get $T E X$ into an endless loop. If such a cycle exists, the routine here detects it when processing the largest character code in the cycle.

font_metric.h

#define check_byte_range(X)         \
    do {                            \
        if ((X) < bc || (X) > ec) { \
            abort;                  \
        }                           \
    } while (0)

#define current_character_being_worked_on (k + bc - fmem_ptr)

⟨ Check for charlist cycle 570 ⟩≡

check_byte_range(d);
while (d < current_character_being_worked_on) {
    qw = char_info(f, d);
    // N.B.: not |d|, since |char_base[f]| hasn't been adjusted yet
    if (char_tag(qw) != LIST_TAG) {
        goto not_found;
    }
    d = rem_byte(qw); // next character on the list
}
if (d == current_character_being_worked_on) {
    abort; // yes, there's a cycle
}
not_found:

Section 571

A fix_word whose four bytes are $(a, b, c, d)$ from left to right represents the number

$x = {b \cdot 2^{- 4} + c \cdot 2^{- 12} + d \cdot 2^{- 20} - 16 + b \cdot 2^{- 4} + c \cdot 2^{- 12} + d \cdot 2^{- 20} if a = 0; if a = 255.$

(No other choices of $a$ are allowed, since the magnitude of a number in design-size units must be less than 16.) We want to multiply this quantity by the integer $z$ , which is known to be less than $2^{27}$ . If $z < 2^{23}$ , the individual multiplications $b \cdot z$ , $c \cdot z$ , $d \cdot z$ cannot overflow; otherwise we will divide $z$ by 2, 4, 8, or 16, to obtain a multiplier less than $2^{23}$ , and we can compensate for this later. If $z$ has thereby been replaced by $z^{'} = z / 2^{e}$ , let $β = 2^{4 - e}$ ; we shall compute

$⌊(b + c \cdot 2^{- 8} + d \cdot 2^{- 16}) z^{'} / β ⌋$

if $a = 0$ , or the same quantity minus $α = 2^{4 + e} z^{'}$ if $a = 255$ . This calculation must be done exactly, in order to guarantee portability of $T E X$ between computers.

font_metric.h

#define store_scaled(X)                                      \
    do {                                                     \
        fget(a);                                             \
        fget(b);                                             \
        fget(c);                                             \
        fget(d);                                             \
        sw = (((((d*z) / 256) + (c*z)) / 256) + (b*z))/beta; \
        if (a == 0) {                                        \
            (X) = sw;                                        \
        }                                                    \
        else if (a == 255) {                                 \
            (X) = sw - alpha;                                \
        }                                                    \
        else {                                               \
            abort;                                           \
        }                                                    \
    } while (0)

⟨ Read box dimensions 571 ⟩≡

// << Replace |z| by |z'| and compute \alpha, \beta, 572 >>
for(k = width_base[f]; k <= lig_kern_base[f] - 1; k++) {
    store_scaled(font_info[k].sc);
}
if (font_info[width_base[f]].sc != 0) {
    abort; // width[0] must be zero
}
if (font_info[height_base[f]].sc != 0) {
    abort; // height[0] must be zero
}
if (font_info[depth_base[f]].sc != 0) {
    abort; // depth[0] must be zero
}
if (font_info[italic_base[f]].sc != 0) {
    abort; // italic[0] must be zero
}

Section 572

⟨ Replace z by z’ and compute \alpha, \beta 572 ⟩≡

alpha = 16;
while (z >= 0x800000) {
    z /= 2;
    alpha += alpha;
}
beta = 256 / alpha;
alpha *= z;

Section 573

font_metric.h

#define check_existence(X)                              \
    do {                                                \
        check_byte_range((X));                          \
        qw = char_info(f, (X)); /* N.B.: not |qi(X)| */ \
        if (!char_exists(qw)) {                         \
            abort;                                      \
        }                                               \
    } while (0)

⟨ Read ligature/kern program 573 ⟩≡

bch_label = 0x7fff;
bchar = 256;
if (nl > 0) {
    for(k = lig_kern_base[f]; k <= kern_base[f] + KERN_BASE_OFFSET - 1; k++) {
        store_four_quarters(font_info[k]);
        if (a > 128) {
            if (256*c + d >= nl) {
                abort;
            }
            if (a == 255 && k == lig_kern_base[f]) {
                bchar = b;
            }
        }
        else {
            if (b != bchar) {
                check_existence(b);
            }
            if (c < 128) {
                check_existence(d); // check ligature
            }
            else if (256*(c - 128) + d >= nk) {
                abort; // check kern
            }
            if (a < 128 && k - lig_kern_base[f] + a + 1 >= nl) {
                abort;
            }
        }
    }
    if (a == 255) {
        bch_label = 256*c + d;
    }
}
for(k = kern_base[f] + KERN_BASE_OFFSET; k <= exten_base[f] - 1; k++) {
    store_scaled(font_info[k].sc);
}

Section 574

⟨ Read extensible character recipes 574 ⟩≡

for(k = exten_base[f]; k <= param_base[f] - 1; k++) {
    store_four_quarters(font_info[k]);
    if (a != 0) {
        check_existence(a);
    }
    if (b != 0) {
        check_existence(b);
    }
    if (c != 0) {
        check_existence(c);
    }
    check_existence(d);
}

Section 575

We check to see that the TFM file doesn’t end prematurely; but no error message is given for files having more than lf words.

NOTE

Check removed.

⟨ Read font parameters 575 ⟩≡

for(k = 1; k <= np; k++) {
    if (k == 1) {
        // the |slant| parameter is a pure number
        fget(sw);
        if (sw > 127) {
            sw -= 256;
        }
        fget(temp);
        sw = sw*256 + temp;
        fget(temp);
        sw = sw*256 + temp;
        fget(temp);
        font_info[param_base[f]].sc = sw*16 + temp/16;
    }
    else {
        store_scaled(font_info[param_base[f] + k - 1].sc);
    }
}
// if (eof(tmf_file)) then abort;
for(k = np + 1; k <= 7; k++) {
    font_info[param_base[f] + k - 1].sc = 0;
}

Section 576

Now to wrap it up, we have checked all the necessary things about the TFM file, and all we need to do is put the finishing touches on the data for the new font.

NOTE

adjust has been removed since MIN_QUARTERWORD is zero.

⟨ Make final adjustments and goto done 576 ⟩≡

if (np >= 7) {
    font_params[f] = np;
}
else {
    font_params[f] = 7;
}
hyphen_char[f] = default_hyphen_char;
skew_char[f] = default_skew_char;
if (bch_label < nl) {
    bchar_label[f] = bch_label + lig_kern_base[f];
}
else {
    bchar_label[f] = NON_ADDRESS;
}
font_bchar[f] = bchar;
font_false_bchar[f] = bchar;
if (bchar <= ec && bchar >= bc) {
    qw = char_info(f, bchar); // N.B.: not |qi(bchar)|
    if (char_exists(qw)) {
        font_false_bchar[f] = NON_CHAR;
    }
}
font_name[f] = nom;
font_area[f] = aire;
font_bc[f] = bc;
font_ec[f] = ec;
font_glue[f] = null;
// adjust(char_base);
// adjust(width_base);
// adjust(lig_kern_base);
// adjust(kern_base);
// adjust(exten_base);
decr(param_base[f]);
fmem_ptr += lf;
font_ptr = f;
g = f;
goto done;

Section 577

Before we forget about the format of these tables, let’s deal with two of $T E X$ ’s basic scanning routines related to font information.

⟨ Declare procedures that scan font-related stuff 577 ⟩≡

void scan_font_ident() {
    internal_font_number f;
    halfword m;
    // << Get the next non-blank non-call token, 406 >>
    if (cur_cmd == DEF_FONT) {
        f = cur_font;
    }
    else if (cur_cmd == SET_FONT) {
        f = cur_chr;
    }
    else if (cur_cmd == DEF_FAMILY) {
        m = cur_chr;
        scan_four_bit_int();
        f = equiv(m + cur_val);
    }
    else {
        print_err("Missing font identifier");
        help2("I was looking for a control sequence whose")
            ("current meaning has been defined by \\font.");
        back_error();
        f = NULL_FONT;
    }
    cur_val = f;
}

Section 578

The following routine is used to implement ‘\fontdimen n f’. The boolean parameter writing is set true if the calling program intends to change the parameter value.

⟨ Declare procedures that scan font-related stuff 577 ⟩+≡

// sets |cur_val| to |font_info| location
void find_font_dimen(bool writing) {
    internal_font_number f;
    int n; // the parameter number
    scan_int();
    n = cur_val;
    scan_font_ident();
    f = cur_val;
    if (n <= 0) {
        cur_val = fmem_ptr;
    }
    else {
        if (writing && n <= SPACE_SHRINK_CODE && n >= SPACE_CODE && font_glue[f] != null) {
            delete_glue_ref(font_glue[f]);
            font_glue[f] = null;
        }
        if (n > font_params[f]) {
            if (f < font_ptr) {
                cur_val = fmem_ptr;
            }
            else {
                // << Increase the number of parameters in the last font, 580 >>
            }
        }
        else {
            cur_val = n + param_base[f];
        }
    }
    // << Issue an error message if |cur_val = fmem_ptr|, 579 >>
}

Section 579

⟨ Issue an error message if cur_val = fmem_ptr 579 ⟩≡

if (cur_val == fmem_ptr) {
    print_err("Font ");
    print_esc_strnumber(font_id_text(f));
    print(" has only ");
    print_int(font_params[f]);
    print(" fontdimen parameters");
    help2("To increase the number of font parameters, you must")
        ("use \\fontdimen immediately after the \\font is loaded.");
    error();
}

Section 580

⟨ Increase the number of parameters in the last font 580 ⟩≡

do {
    if (fmem_ptr == FONT_MEM_SIZE) {
        overflow("font memory", FONT_MEM_SIZE);
    }
    font_info[fmem_ptr].sc = 0;
    incr(fmem_ptr);
    incr(font_params[f]);
} while (n != font_params[f]);
cur_val = fmem_ptr - 1; // this equals |param_base[f] + font_params[f]|

Section 581

When $T E X$ wants to typeset a character that doesn’t exist, the character node is not created; thus the output routine can assume that characters exist when it sees them. The following procedure prints a warning message unless the user has suppressed it.

font_metric.c

void char_warning(internal_font_number f, eight_bits c) {
    if (tracing_lost_chars > 0) {
        begin_diagnostic();
        print_nl("Missing character: There is no ");
        print_strnumber(c);
        print(" in font ");
        slow_print(font_name[f]);
        print_char('!');
        end_diagnostic(false);
    }
}

Section 582

Here is a function that returns a pointer to a character node for a given character in a given font. If that character doesn’t exist, null is returned instead.

font_metric.c

pointer new_character(internal_font_number f, eight_bits c) {
    pointer p; // newly allocated node
    if (font_bc[f] <= c && font_ec[f] >= c && char_exists(char_info(f, c))) {
        p = get_avail();
        font(p) = f;
        character(p) = c;
        return p;
    }
    char_warning(f, c);
    return null;
}

TeX in C