Section 813: Breaking paragraphs into lines

We come now to what is probably the most interesting algorithm of $T E X$ : the mechanism for choosing the “best possible” breakpoints that yield the individual lines of a paragraph. $T E X$ ’s line-breaking algorithm takes a given horizontal list and converts it to a sequence of boxes that are appended to the current vertical list. In the course of doing this, it creates a special data structure containing three kinds of records that are not used elsewhere in $T E X$ . Such nodes are created while a paragraph is being processed, and they are destroyed afterwards; thus, the other parts of $T E X$ do not need to know anything about how line-breaking is done.

The method used here is based on an approach devised by Michael F. Plass and the author in 1977, subsequently generalized and improved by the same two people in 1980. A detailed discussion appears in Software—Practice and Experience 11 (1981), 1119–1184, where it is shown that the line-breaking problem can be regarded as a special case of the problem of computing the shortest path in an acyclic network. The cited paper includes numerous examples and describes the history of line breaking as it has been practiced by printers through the ages. The present implementation adds two new ideas to the algorithm of 1980: Memory space requirements are considerably reduced by using smaller records for inactive nodes than for active ones, and arithmetic overflow is avoided by using “delta distances” instead of keeping track of the total distance from the beginning of the paragraph to the current point.

Section 814

The line_break procedure should be invoked only in horizontal mode; it leaves that mode and places its output into the current vlist of the enclosing vertical mode (or internal vertical mode). There is one explicit parameter: final_widow_penalty is the amount of additional penalty to be inserted before the final line of the paragraph.

There are also a number of implicit parameters: The hlist to be broken starts at link(head), and it is nonempty. The value of prev_graf in the enclosing semantic level tells where the paragraph should begin in the sequence of line numbers, in case hanging indentation or \parshape is in use; prev_graf is zero unless this paragraph is being continued after a displayed formula. Other implicit parameters, such as the par_shape_ptr and various penalties to use for hyphenation, etc., appear in eqtb.

After line_break has acted, it will have updated the current vlist and the value of prev_graf. Furthermore, the global variable just_box will point to the final box created by line_break, so that the width of this line can be ascertained when it is necessary to decide whether to use above_display_skip or above_display_short_skip before a displayed formula.

⟨ Global variables 13 ⟩+≡

pointer just_box; // the |HLIST_NODE| for the last line of the new paragraph

Section 815

Since line_break is a rather lengthy procedure—sort of a small world unto itself—we must build it up little by little, somewhat more cautiously than we have done with the simpler procedures of $T E X$ . Here is the general outline.

line_break.c

// << Start file |line_break.c|, 1382 >>

// << Declare subprocedures for |line_break|, 826 >>

void line_break(int final_widow_penalty) {
    // << Local variables for line breaking, 862 >>
    pack_begin_line = mode_line; // this is for over/underfull box messages

    // << Get ready to start line breaking, 816 >>

    // << Find optimal breakpoints, 863 >>
    
    // << Break the paragraph at the chosen breakpoints, justify the resulting lines to the correct widths, and append them to the current vertical list, 876 >>
    
    // << Clean up the memory by removing the break nodes, 865 >>
    pack_begin_line = 0;
}

Section 816

The first task is to move the list from head to TEMP_HEAD and go into the enclosing semantic level. We also append the \parfillskip glue to the end of the paragraph, removing a space (or other glue node) if it was there, since spaces usually precede blank lines and instances of ‘$$’. The par_fill_skip is preceded by an infinite penalty, so it will never be considered as a potential breakpoint.

This code assumes that a GLUE_NODE and a PENALTY_NODE occupy the same number of mem words.

⟨ Get ready to start line breaking 816 ⟩≡

link(TEMP_HEAD) = link(head);
if (is_char_node(tail)) {
    tail_append(new_penalty(INF_PENALTY));
}
else if (type(tail) != GLUE_NODE) {
    tail_append(new_penalty(INF_PENALTY));
}
else {
    type(tail) = PENALTY_NODE;
    delete_glue_ref(glue_ptr(tail));
    flush_node_list(leader_ptr(tail));
    penalty(tail) = INF_PENALTY;
}
link(tail) = new_param_glue(PAR_FILL_SKIP_CODE);
init_cur_lang = prev_graf % 0x10000;
init_l_hyf = prev_graf / 0x400000;
init_r_hyf = (prev_graf / 0x10000) % 64;
pop_nest();

Section 817

When looking for optimal line breaks, $T E X$ creates a “break node” for each break that is feasible, in the sense that there is a way to end a line at the given place without requiring any line to stretch more than a given tolerance. A break node is characterized by three things: the position of the break (which is a pointer to a GLUE_NODE, MATH_NODE, PENALTY_NODE, or DISC_NODE); the ordinal number of the line that will follow this breakpoint; and the fitness classification of the line that has just ended, i.e., TIGHT_FIT, DECENT_FIT, LOOSE_FIT, or VERY_LOOSE_FIT.

constants.h

#define TIGHT_FIT      3 // fitness classification for lines shrinking 0.5 to 1.0 of their shrinkability
#define LOOSE_FIT      1 // fitness classification for lines stretching 0.5 to 1.0 of their stretchability
#define VERY_LOOSE_FIT 0 // fitness classification for lines stretching more than their stretchability
#define DECENT_FIT     2 // fitness classification for all other lines

Section 818

The algorithm essentially determines the best possible way to achieve each feasible combination of position, line, and fitness. Thus, it answers questions like, “What is the best way to break the opening part of the paragraph so that the fourth line is a tight line ending at such-and-such a place?” However, the fact that all lines are to be the same length after a certain point makes it possible to regard all sufficiently large line numbers as equivalent, when the looseness parameter is zero, and this makes it possible for the algorithm to save space and time.

An “active node” and a “passive node” are created in mem for each feasible breakpoint that needs to be considered. Active nodes are three words long and passive nodes are two words long. We need active nodes only for breakpoints near the place in the paragraph that is currently being examined, so they are recycled within a comparatively short time after they are created.

Section 819

An active node for a given breakpoint contains six fields:

link points to the next node in the list of active nodes; the last active node has link = LAST_ACTIVE.
break_node points to the passive node associated with this breakpoint.
line_number is the number of the line that follows this breakpoint.
fitness is the fitness classification of the line ending at this breakpoint.
type is either HYPHENATED or UNHYPHENATED, depending on whether this breakpoint is a DISC_NODE.
total_demerits is the minimum possible sum of demerits over all lines leading from the beginning of the paragraph to this breakpoint.

The value of link(ACTIVE) points to the first active node on a linked list of all currently active nodes. This list is in order by line_number, except that nodes with line_number $>$ easy_line may be in any order relative to each other.

constants.h

#define ACTIVE_NODE_SIZE 3      // number of words in active nodes
#define UNHYPHENATED     0      // the |type| of a normal active break node
#define HYPHENATED       1      // the |type| of an active node that breaks at a |DISC_NODE|
#define LAST_ACTIVE      ACTIVE // the active list ends where it begins

breaker.h

// << Start file |breaker.h|, 1381 >>

#define fitness           subtype              // |VERY_LOOSE_FIT .. TIGHT_FIT| on final line for this break
#define break_node        rlink                // pointer to the corresponding passive node
#define line_number       llink                // line that begins at this breakpoint
#define total_demerits(X) mem[(X) + 2].integer // the quantity that TeX minimizes

Section 820

⟨ Initialize the special list heads and constant nodes 790 ⟩+≡

type(LAST_ACTIVE) = HYPHENATED;
line_number(LAST_ACTIVE) = MAX_HALFWORD;
subtype(LAST_ACTIVE) = 0; // the |subtype| is never examined by the algorithm

Section 821

The passive node for a given breakpoint contains only four fields:

link points to the passive node created just before this one, if any, otherwise it is null.
cur_break points to the position of this breakpoint in the horizontal list for the paragraph being broken.
prev_break points to the passive node that should precede this one in an optimal path to this breakpoint.
serial is equal to n if this passive node is the nth one created during the current pass. (This field is used only when printing out detailed statistics about the line-breaking calculations.)

There is a global variable called passive that points to the most recently created passive node. Another global variable, printed_node, is used to help print out the paragraph when detailed information about the line-breaking computation is being displayed.

constants.h

#define PASSIVE_NODE_SIZE 2 // number of words in passive nodes

breaker.h

#define cur_break  rlink // in passive node, points to position of this breakpoint
#define prev_break llink // points to passive node that should precede this one
#define serial     info  // serial number for symbolic identification

⟨ Global variables 13 ⟩+≡

pointer passive;      // most recent node on passive list
pointer printed_node; // most recent node that has been printed
halfword pass_number; // the number of passive nodes allocated on this pass

Section 822

The active list also contains “delta” nodes that help the algorithm compute the badness of individual lines. Such nodes appear only between two active nodes, and they have type = DELTA_NODE. If p and r are active nodes and if q is a delta node between them, so that link(p) = q and link(q) = r, then q tells the space difference between lines in the horizontal list that start after breakpoint p and lines that start after breakpoint r. In other words, if we know the length of the line that starts after p and ends at our current position, then the corresponding length of the line that starts after r is obtained by adding the amounts in node q. A delta node contains six scaled numbers, since it must record the net change in glue stretchability with respect to all orders of infinity. The natural width difference appears in mem[q + 1].sc; the stretch differences in units of pt, fil, fill, and filll appear in mem[q + 2 .. q + 5].sc; and the shrink difference appears in mem[q + 6].sc. The subtype field of a delta node is not used.

constants.h

#define DELTA_NODE_SIZE 7 // number of words in a delta node
#define DELTA_NODE      2 // |type| field in a delta node

Section 823

As the algorithm runs, it maintains a set of six delta-like registers for the length of the line following the first active breakpoint to the current position in the given hlist. When it makes a pass through the active list, it also maintains a similar set of six registers for the length following the active breakpoint of current interest. A third set holds the length of an empty line (namely, the sum of \leftskip and \rightskip); and a fourth set is used to create new delta nodes.

When we pass a delta node we want to do operations like

for k ← 1 to 6 do cur_active_width[k] ← cur_active_width[k] + mem[q + k].sc;

and we want to do this without the overhead of for loops. The do_all_six macro makes such six-tuples convenient.

breaker.h

#define do_all_six(X) X(1); X(2); X(3); X(4); X(5); X(6)

⟨ Global variables 13 ⟩+≡

// distance from first active node to |cur_p|
scaled active_width0[6];
scaled *active_width = active_width0 - 1;

// distance from current active node
scaled cur_active_width0[6];
scaled *cur_active_width = cur_active_width0 - 1;

// length of an "empty" line
scaled background0[6];
scaled *background = background0 - 1;

// length being computed after current break
scaled break_width0[6];
scaled *break_width = break_width0 - 1;

Section 824

Let’s state the principles of the delta nodes more precisely and concisely, so that the following programs will be less obscure. For each legal breakpoint p in the paragraph, we define two quantities $α (p)$ and $β (p)$ such that the length of material in a line from breakpoint $p$ to breakpoint $q$ is $γ + β (q) - α (p)$ , for some fixed $γ$ . Intuitively, $α (p)$ and $β (q)$ are the total length of material from the beginning of the paragraph to a point “after” a break at $p$ and to a point “before” a break at $q$ ; and $γ$ is the width of an empty line, namely the length contributed by \leftskip and \rightskip.

Suppose, for example, that the paragraph consists entirely of alternating boxes and glue skips; let the boxes have widths $x_{1} \dots x_{n}$ and let the skips have widths $y_{1} \dots y_{n}$ , so that the paragraph can be represented by $x_{1} y_{1} \dots x_{n} y_{n}$ . Let $p_{i}$ be the legal breakpoint at $y_{i}$ ; then $α (p_{i}) = x_{1} + y_{1} + \dots + x_{i} + y_{i}$ , and $β (p_{i}) = x_{1} + y_{1} + \dots + x_{i}$ . To check this, note that the length of material from $p_{2}$ to $p_{5}$ , say, is $γ + x_{3} + y_{3} + x_{4} + y_{4} + x_{5} = γ + β (p_{5}) - α (p_{2})$ .

The quantities $α$ , $β$ , $γ$ involve glue stretchability and shrinkability as well as a natural width. If we were to compute $α (p)$ and $β (p)$ for each $p$ , we would need multiple precision arithmetic, and the multiprecise numbers would have to be kept in the active nodes. $T E X$ avoids this problem by working entirely with relative differences or “deltas”. Suppose, for example, that the active list contains $a_{1} δ_{1} a_{2} δ_{2} a_{3}$ , where the $a$ ’s are active breakpoints and the $δ$ ’s are delta nodes. Then $δ_{1} = α (a_{1}) - α (a_{2})$ and $δ_{2} = α (a_{2}) - α (a_{3})$ . If the line breaking algorithm is currently positioned at some other breakpoint $p$ , the active_width array contains the value $γ + β (p) - α (a_{1})$ . If we are scanning through the list of active nodes and considering a tentative line that runs from $a_{2}$ to $p$ , say, the cur_active_width array will contain the value $γ + β (p) - α (a_{2})$ . Thus, when we move from $a_{2}$ to $a_{3}$ , we want to add $α (a_{2}) - α (a_{3})$ to cur_active_width; and this is just $δ_{2}$ , which appears in the active list between $a_{2}$ and $a_{3}$ . The background array contains $γ$ . The break_width array will be used to calculate values of new delta nodes when the active list is being updated.

Section 825

Glue nodes in a horizontal list that is being paragraphed are not supposed to include “infinite” shrinkability; that is why the algorithm maintains four registers for stretching but only one for shrinking. If the user tries to introduce infinite shrinkability, the shrinkability will be reset to finite and an error message will be issued. A boolean variable no_shrink_error_yet prevents this error message from appearing more than once per paragraph.

breaker.h

#define check_shrinkage(X)                                     \
    do {                                                       \
        if (shrink_order((X)) != NORMAL && shrink((X)) != 0) { \
            (X) = finite_shrink((X));                          \
        }                                                      \
    } while (0)

⟨ Global variables 13 ⟩+≡

bool no_shrink_error_yet; // have we complained about infinite shrinkage?

Section 826

⟨ Declare subprocedures for line_break 826 ⟩≡

// recovers from infinite shrinkage
pointer finite_shrink(pointer p) {
    pointer q; // new glue specification
    if (no_shrink_error_yet) {
        no_shrink_error_yet = false;
#ifdef STAT
        if (tracing_paragraphs > 0) {
            end_diagnostic(true);
        }
#endif
        print_err("Infinite glue shrinkage found in a paragraph");
        help5("The paragraph just ended includes some glue that has")
            ("infinite shrinkability, e.g., `\\hskip 0pt minus 1fil'.")
            ("Such glue doesn't belong there---it allows a paragraph")
            ("of any length to fit on one line. But it's safe to proceed,")
            ("since the offensive shrinkability has been made finite.");
        error();
#ifdef STAT
        if (tracing_paragraphs > 0) {
            begin_diagnostic();
        }
#endif
    }
    q = new_spec(p);
    shrink_order(q) = NORMAL;
    delete_glue_ref(p);
    return q;
}

Section 827

⟨ Get ready to start line breaking 816 ⟩+≡

no_shrink_error_yet = true;
check_shrinkage(left_skip);
check_shrinkage(right_skip);
q = left_skip;
r = right_skip;
background[1] = width(q) + width(r);
background[2] = 0;
background[3] = 0;
background[4] = 0;
background[5] = 0;
background[2 + stretch_order(q)] = stretch(q);
background[2 + stretch_order(r)] += stretch(r);
background[6] = shrink(q) + shrink(r);

Section 828

A pointer variable cur_p runs through the given horizontal list as we look for breakpoints. This variable is global, since it is used both by line_break and by its subprocedure try_break.

Another global variable called threshold is used to determine the feasibility of individual lines: Breakpoints are feasible if there is a way to reach them without creating lines whose badness exceeds threshold. (The badness is compared to threshold before penalties are added, so that penalty values do not affect the feasibility of breakpoints, except that no break is allowed when the penalty is 10000 or more.) If threshold is 10000 or more, all legal breaks are considered feasible, since the badness function specified above never returns a value greater than 10000.

Up to three passes might be made through the paragraph in an attempt to find at least one set of feasible breakpoints. On the first pass, we have threshold = pretolerance and second_pass = final_pass = false. If this pass fails to find a feasible solution, threshold is set to tolerance, second_pass is set true, and an attempt is made to hyphenate as many words as possible. If that fails too, we add emergency_stretch to the background stretchability and set final_pass = true.

⟨ Global variables 13 ⟩+≡

pointer cur_p;    // the current breakpoint under consideration
bool second_pass; // is this our second attempt to break this paragraph?
bool final_pass;  // is this our final attempt to break this paragraph?
int threshold;    // maximum badness on feasible lines

Section 829

The heart of the line-breaking procedure is ‘try_break’, a subroutine that tests if the current breakpoint cur_p is feasible, by running through the active list to see what lines of text can be made from active nodes to cur_p. If feasible breaks are possible, new break nodes are created. If cur_p is too far from an active node, that node is deactivated.

The parameter pi to try_break is the penalty associated with a break at cur_p; we have pi = EJECT_PENALTY if the break is forced, and pi = INF_PENALTY if the break is illegal.

The other parameter, break_type, is set to HYPHENATED or UNHYPHENATED, depending on whether or not the current break is at a DISC_NODE. The end of a paragraph is also regarded as ‘HYPHENATED’; this case is distinguishable by the condition cur_p = null.

breaker.h

#define copy_to_cur_active(X) cur_active_width[(X)] = active_width[(X)]

⟨ Declare subprocedures for line_break 826 ⟩+≡

void try_break(int pi, small_number break_type) {
    pointer r;      // runs through the active list
    pointer prev_r; // stays a step behind |r|
    halfword old_l; // maximum line number in current equivalence class of lines
    bool no_break_yet; // have we found a feasible break at |cur_p|?
    // << Other local variables for |try_break|, 830 >>
    
    // << Make sure that |pi| is in the proper range, 831 >>
    no_break_yet = true;
    prev_r = ACTIVE;
    old_l = 0;
    do_all_six(copy_to_cur_active);
    while(true) {
continue_lbl:
        r = link(prev_r);
        
        // << If node |r| is of type |DELTA_NODE|, update |cur_active_width|, set |prev_r| and |prev_prev_r|, then |goto continue|, 832 >>

        // << If a line number class has ended, create new active nodes for the best feasible breaks in that class; then |return| if |r = LAST_ACTIVE|, otherwise compute the new |line_width|, 835 >>
        
        // << Consider the demerits for a line from |r| to |cur_p|; deactivate node |r| if it should no longer be active; then |goto continue| if a line from |r| to |cur_p| is infeasible, otherwise record a new feasible break, 851 >>
    }
end:
#ifdef STAT
    // << Update the value of |printed_node| for symbolic displays, 858 >>
#endif
}

Section 830

⟨ Other local variables for try_break 830 ⟩≡

pointer prev_prev_r = null; // a step behind |prev_r|, if |type(prev_r) = DELTA_NODE|
pointer s;                  // runs through nodes ahead of |cur_p|
pointer q;                  // points to a new node being created
pointer v;                  // points to a glue specification or a node ahead of |cur_p|
int t;                      // node count, if |cur_p| is a discretionary node
internal_font_number f;     // used in character width calculation
halfword l;                 // line number of current active node
bool node_r_stays_active;   // should node |r| remain in the active list?
scaled line_width = 0;      // the current line will be justified to this width
int fit_class;              // possible fitness class of test line
halfword b;                 // badness of test line
int d;                      // demerits of test line
bool artificial_demerits;   // has |d| been forced to zero?
#ifdef STAT
pointer save_link;          // temporarily holds value of |link(cur_p)|
#endif
scaled shortfall;           // used in badness calculations

Section 831

⟨ Make sure that pi is in the proper range 831 ⟩≡

if (abs(pi) >= INF_PENALTY) {
    if (pi > 0) {
        goto end; // this breakpoint is inhibited by infinite penalty
    }
    else {
        pi = EJECT_PENALTY; // this breakpoint will be forced
    }
}

Section 832

The following code uses the fact that type(LAST_ACTIVE) $\neq =$ DELTA_NODE.

breaker.h

#define update_width(X) cur_active_width[(X)] += mem[r + (X)].sc

⟨ If node r is of type DELTA_NODE, update cur_active_width, set prev_r and prev_prev_r, then goto continue 832 ⟩≡

if (type(r) == DELTA_NODE) {
    do_all_six(update_width);
    prev_prev_r = prev_r;
    prev_r = r;
    goto continue_lbl;
}

Section 833

As we consider various ways to end a line at cur_p, in a given line number class, we keep track of the best total demerits known, in an array with one entry for each of the fitness classifications. For example, minimal_demerits[TIGHT_FIT] contains the fewest total demerits of feasible line breaks ending at cur_p with a TIGHT_FIT line; best_place[TIGHT_FIT] points to the passive node for the break before cur_p that achieves such an optimum; and best_pl_line[TIGHT_FIT] is the line_number field in the active node corresponding to best_place[TIGHT_FIT]. When no feasible break sequence is known, the minimal_demerits entries will be equal to AWFUL_BAD, which is $2^{30} - 1$ . Another variable, minimum_demerits, keeps track of the smallest value in the minimal_demerits array.

constants.h

#define AWFUL_BAD 0x3fffffff // more than a billion demerits

⟨ Global variables 13 ⟩+≡

int minimal_demerits[4];  // best total demerits known for current line class and position, given the fitness
int minimum_demerits;     // best total demerits known for current line class and position
pointer best_place[4];    // how to achieve |minimal_demerits|
halfword best_pl_line[4]; // corresponding line number

Section 834

⟨ Get ready to start line breaking 816 ⟩+≡

minimum_demerits = AWFUL_BAD;
minimal_demerits[TIGHT_FIT] = AWFUL_BAD;
minimal_demerits[DECENT_FIT] = AWFUL_BAD;
minimal_demerits[LOOSE_FIT] = AWFUL_BAD;
minimal_demerits[VERY_LOOSE_FIT] = AWFUL_BAD;

Section 835

The first part of the following code is part of $T E X$ ’s inner loop, so we don’t want to waste any time. The current active node, namely node r, contains the line number that will be considered next. At the end of the list we have arranged the data structure so that r = LAST_ACTIVE and line_number(LAST_ACTIVE) $>$ old_l.

⟨ If a line number class has ended, create new active nodes for the best feasible breaks in that class; then return if r = LAST_ACTIVE, otherwise compute the new line_width 835 ⟩≡

l = line_number(r);
if (l > old_l) {
    // now we are no longer in the inner loop
    if (minimum_demerits < AWFUL_BAD
        && (old_l != easy_line || r == LAST_ACTIVE))
    {
        // << Create new active nodes for the best feasible breaks just found, 836 >>
    }
    if (r == LAST_ACTIVE) {
        goto end;
    }
    // << Compute the new line width, 850 >>
}

Section 836

It is not necessary to create new active nodes having minimal_demerits greater than minimum_demerits + abs(adj_demerits), since such active nodes will never be chosen in the final paragraph breaks. This observation allows us to omit a substantial number of feasible breakpoints from further consideration.

⟨ Create new active nodes for the best feasible breaks just found 836 ⟩≡

if (no_break_yet) {
    // << Compute the values of |break_width|, 837 >>
}
// << Insert a delta node to prepare for breaks at |cur_p|, 843 >>
if (abs(adj_demerits) >= AWFUL_BAD - minimum_demerits) {
    minimum_demerits = AWFUL_BAD - 1;
}
else {
    minimum_demerits += abs(adj_demerits);
}
for(fit_class = VERY_LOOSE_FIT; fit_class <= TIGHT_FIT; fit_class++) {
    if (minimal_demerits[fit_class] <= minimum_demerits) {
        // << Insert a new active node from |best_place[fit_class]| to |cur_p|, 845 >>
    }
    minimal_demerits[fit_class] = AWFUL_BAD;
}
minimum_demerits = AWFUL_BAD;
// << Insert a delta node to prepare for the next active node, 844 >>

Section 837

When we insert a new active node for a break at cur_p, suppose this new node is to be placed just before active node $a$ ; then we essentially want to insert ‘ $δ$ , cur_p $δ^{'}$ ’ before $a$ , where $δ = α (a) - α ($ cur_p $)$ and $δ^{'} = α ($ cur_p $) - α (a)$ in the notation explained above. The cur_active_width array now holds $γ + β ($ cur_p $) - α (a)$ ; so $δ$ can be obtained by subtracting cur_active_width from the quantity $γ + β ($ cur_p $) - α ($ cur_p $)$ . The latter quantity can be regarded as the length of a line “from cur_p to cur_p”; we call it the break_width at cur_p.

The break_width is usually negative, since it consists of the background (which is normally zero) minus the width of nodes following cur_p that are eliminated after a break. If, for example, node cur_p is a glue node, the width of this glue is subtracted from the background; and we also look ahead to eliminate all subsequent glue and penalty and kern and math nodes, subtracting their widths as well.

Kern nodes do not disappear at a line break unless they are EXPLICIT.

breaker.h

#define set_break_width_to_background(X) break_width[(X)] = background[(X)]

⟨ Compute the values of break_width 837 ⟩≡

no_break_yet = false;
do_all_six(set_break_width_to_background);
s = cur_p;
if (break_type > UNHYPHENATED && cur_p != null) {
    // << Compute the discretionary |break_width| values, 840 >>
}
while (s != null) {
    if (is_char_node(s)) {
        goto done;
    }
    switch (type(s)) {
    case GLUE_NODE:
        // << Subtract glue from |break_width|, 838 >>
        break;
    
    case PENALTY_NODE:
        do_nothing;
        break;
    
    case MATH_NODE:
        break_width[1] -= width(s);
        break;
    
    case KERN_NODE:
        if (subtype(s) != EXPLICIT) {
            goto done;
        }
        else {
            break_width[1] -= width(s);
        }
        break;
    
    default:
        goto done;
    }
    s = link(s);
}
done:

Section 838

⟨ Subtract glue from break_width 838 ⟩≡

v = glue_ptr(s);
break_width[1] -= width(v);
break_width[2 + stretch_order(v)] -= stretch(v);
break_width[6] -= shrink(v);

Section 839

When cur_p is a discretionary break, the length of a line “from cur_p to cur_p” has to be defined properly so that the other calculations work out. Suppose that the pre-break text at cur_p has length $l_{0}$ , the post-break text has length $l_{1}$ , and the replacement text has length $l$ . Suppose also that $q$ is the node following the replacement text. Then length of a line from cur_p to $q$ will be computed as $γ + β (q) - α ($ cur_p $)$ , where $β (q) = β ($ cur_p $) - l_{0} + l$ . The actual length will be the background plus $l_{1}$ , so the length from cur_p to cur_p should be $γ + l_{0} + l_{1} - l$ . If the post-break text of the discretionary is empty, a break may also discard $q$ ; in that unusual case we subtract the length of $q$ and any other nodes that will be discarded after the discretionary break.

The value of $l_{0}$ need not be computed, since line_break will put it into the global variable disc_width before calling try_break.

⟨ Global variables 13 ⟩+≡

scaled disc_width; // the length of discretionary material preceding a break

Section 840

⟨ Compute the discretionary break_width values 840 ⟩≡

t = replace_count(cur_p);
v = cur_p;
s = post_break(cur_p);
while (t > 0) {
    decr(t);
    v = link(v);
    // << Subtract the width of node |v| from |break_width|, 841 >>
}
while (s != null) {
    // << Add the width of node |s| to |break_width|, 842 >>
    s = link(s);
}
break_width[1] += disc_width;
if (post_break(cur_p) == null) {
    s = link(v); // nodes may be discardable after the break
}

Section 841

Replacement texts and discretionary texts are supposed to contain only character nodes, kern nodes, ligature nodes, and box or rule nodes.

⟨ Subtract the width of node v from break_width 841 ⟩≡

if (is_char_node(v)) {
    f = font(v);
    break_width[1] -= char_width(f, char_info(f, character(v)));
}
else {
    switch (type(v)) {
    case LIGATURE_NODE:
        f = font(lig_char(v));
        break_width[1] -= char_width(f, char_info(f, character(lig_char(v))));
        break;
    
    case HLIST_NODE:
    case VLIST_NODE:
    case RULE_NODE:
    case KERN_NODE:
        break_width[1] -= width(v);
        break;
    
    default:
        confusion("disc1");
    }
}

Section 842

⟨ Add the width of node s to break_width 842 ⟩≡

if (is_char_node(s)) {
    f = font(s);
    break_width[1] += char_width(f, char_info(f, character(s)));
}
else {
    switch (type(s)) {
    case LIGATURE_NODE:
        f = font(lig_char(s));
        break_width[1] += char_width(f, char_info(f, character(lig_char(s))));
        break;

    case HLIST_NODE:
    case VLIST_NODE:
    case RULE_NODE:
    case KERN_NODE:
        break_width[1] += width(s);
        break;
    
    default:
        confusion("disc2");
    }
}

Section 843

We use the fact that type(ACTIVE) $\neq =$ DELTA_NODE.

breaker.h

#define convert_to_break_width(X)   mem[prev_r + (X)].sc += (-cur_active_width[(X)] + break_width[(X)])
#define store_break_width(X)        active_width[(X)] = break_width[(X)]
#define new_delta_to_break_width(X) mem[q + (X)].sc = break_width[(X)] - cur_active_width[(X)]

⟨ Insert a delta node to prepare for breaks at cur_p 843 ⟩≡

if (type(prev_r) == DELTA_NODE) {
    // modify an existing delta node
    do_all_six(convert_to_break_width);
}
else if (prev_r == ACTIVE) {
    // no delta node needed at the beginning
    do_all_six(store_break_width);
}
else {
    q = get_node(DELTA_NODE_SIZE);
    link(q) = r;
    type(q) = DELTA_NODE;
    subtype(q) = 0; // the |subtype| is not used
    do_all_six(new_delta_to_break_width);
    link(prev_r) = q;
    prev_prev_r = prev_r;
    prev_r = q;
}

Section 844

When the following code is performed, we will have just inserted at least one active node before r, so type(prev_r) $\neq =$ DELTA_NODE.

breaker.h

#define new_delta_from_break_width(X) mem[q + (X)].sc = cur_active_width[(X)] - break_width[(X)]

⟨ Insert a delta node to prepare for the next active node 844 ⟩≡

if (r != LAST_ACTIVE) {
    q = get_node(DELTA_NODE_SIZE);
    link(q) = r;
    type(q) = DELTA_NODE;
    subtype(q) = 0; // the |subtype| is not used
    do_all_six(new_delta_from_break_width);
    link(prev_r) = q;
    prev_prev_r = prev_r;
    prev_r = q;
}

Section 845

When we create an active node, we also create the corresponding passive node.

⟨ Insert a new active node from best_place[fit_class] to cur_p 845 ⟩≡

q = get_node(PASSIVE_NODE_SIZE);
link(q) = passive;
passive = q;
cur_break(q) = cur_p;
#ifdef STAT
incr(pass_number);
serial(q) = pass_number;
#endif
prev_break(q) = best_place[fit_class];
q = get_node(ACTIVE_NODE_SIZE);
break_node(q) = passive;
line_number(q) = best_pl_line[fit_class] + 1;
fitness(q) = fit_class;
type(q) = break_type;
total_demerits(q) = minimal_demerits[fit_class];
link(q) = r;
link(prev_r) = q;
prev_r = q;
#ifdef STAT
if (tracing_paragraphs > 0) {
    // << Print a symbolic description of the new break node, 846 >>
}
#endif

Section 846

⟨ Print a symbolic description of the new break node 846 ⟩≡

print_nl("@@");
print_int(serial(passive));
print(": line ");
print_int(line_number(q) - 1);
print_char('.');
print_int(fit_class);
if (break_type == HYPHENATED) {
    print_char('-');
}
print(" t=");
print_int(total_demerits(q));
print(" -> @@");
if (prev_break(passive) == null) {
    print_char('0');
}
else {
    print_int(serial(prev_break(passive)));
}

Section 847

The length of lines depends on whether the user has specified \parshape or \hangindent. If par_shape_ptr is not null, it points to a (2n + 1)-word record in mem, where the info in the first word contains the value of n, and the other 2n words contain the left margins and line lengths for the first n lines of the paragraph; the specifications for line n apply to all subsequent lines. If par_shape_ptr = null, the shape of the paragraph depends on the value of n = hang_after; if n $\geq$ 0, hanging indentation takes place on lines n + 1, n + 2, $\dots$ , otherwise it takes place on lines 1, $\dots$ , |n|. When hanging indentation is active, the left margin is hang_indent, if hang_indent $\geq$ 0, else it is 0; the line length is hsize − |hang_indent|. The normal setting is par_shape_ptr = null, hang_after = 1, and hang_indent = 0. Note that if hang_indent = 0, the value of hang_after is irrelevant.

⟨ Global variables 13 ⟩+≡

halfword easy_line;         // line numbers |> easy_line| are equivalent in break nodes
halfword last_special_line; // line numbers |> last_special_line| all have the same width
scaled first_width;         // the width of all lines |<= last_special_line|, if no \parshape has been specified
scaled second_width;        // the width of all lines |> last_special_line|
scaled first_indent;        // left margin to go with |first_width|
scaled second_indent;       // left margin to go with |second_width|

Section 848

We compute the values of easy_line and the other local variables relating to line length when the line_break procedure is initializing itself.

⟨ Get ready to start line breaking 816 ⟩+≡

if (par_shape_ptr == null) {
    if (hang_indent == 0) {
        last_special_line = 0;
        second_width = hsize;
        second_indent = 0;
    }
    else {
        // << Set line length parameters in preparation for hanging indentation, 849 >>
    }
}
else {
    last_special_line = info(par_shape_ptr) - 1;
    second_width = mem[par_shape_ptr + 2*(last_special_line + 1)].sc;
    second_indent = mem[par_shape_ptr + 2*last_special_line + 1].sc;
}
if (looseness == 0) {
    easy_line = last_special_line;
}
else {
    easy_line = MAX_HALFWORD;
}

Section 849

⟨ Set line length parameters in preparation for hanging indentation 849 ⟩≡

last_special_line = abs(hang_after);
if (hang_after < 0) {
    first_width = hsize - abs(hang_indent);
    if (hang_indent >= 0) {
        first_indent = hang_indent;
    }
    else {
        first_indent = 0;
    }
    second_width = hsize;
    second_indent = 0;
}
else {
    first_width = hsize;
    first_indent = 0;
    second_width = hsize - abs(hang_indent);
    if (hang_indent >= 0) {
        second_indent = hang_indent;
    }
    else {
        second_indent = 0;
    }
}

Section 850

When we come to the following code, we have just encountered the first active node r whose line_number field contains $l$ . Thus we want to compute the length of the $l$ -th line of the current paragraph. Furthermore, we want to set old_l to the last number in the class of line numbers equivalent to $l$ .

⟨ Compute the new line width 850 ⟩≡

if (l > easy_line) {
    line_width = second_width;
    old_l = MAX_HALFWORD - 1;
}
else {
    old_l = l;
    if (l > last_special_line) {
        line_width = second_width;
    }
    else if (par_shape_ptr == null) {
        line_width = first_width;
    }
    else {
        line_width = mem[par_shape_ptr + 2*l].sc;
    }
}

Section 851

The remaining part of try_break deals with the calculation of demerits for a break from r to cur_p.

The first thing to do is calculate the badness, b. This value will always be between zero and INF_BAD + 1; the latter value occurs only in the case of lines from r to cur_p that cannot shrink enough to fit the necessary width. In such cases, node r will be deactivated. We also deactivate node r when a break at cur_p is forced, since future breaks must go through a forced break.

⟨ Consider the demerits for a line from r to cur_p; deactivate node r if it should no longer be active; then goto continue if a line from r to cur_p is infeasible, otherwise record a new feasible break 851 ⟩≡

artificial_demerits = false;
shortfall = line_width - cur_active_width[1]; // we're this much too short
if (shortfall > 0) {
    // << Set the value of |b| to the badness for stretching the line, and compute the corresponding |fit_class|, 852 >>
}
else{
    // << Set the value of |b| to the badness for shrinking the line, and compute the corresponding |fit_class|, 853 >>
}
if (b > INF_BAD || pi == EJECT_PENALTY) {
    // << Prepare to deactivate node |r|, and |goto deactivate| unless there is a reason to consider lines of text from |r| to |cur_p|, 854 >>
}
else {
    prev_r = r;
    if (b > threshold) {
        goto continue_lbl;
    }
    node_r_stays_active = true;
}
// << Record a new feasible break, 855 >>
if (node_r_stays_active) {
    goto continue_lbl; // |prev_r| has been set to |r|
}
deactivate:
// << Deactivate node |r|, 860 >>

Section 852

When a line must stretch, the available stretchability can be found in the subarray cur_active_width[2 .. 5], in units of points, fil, fill, and filll.

The present section is part of $T E X$ ’s inner loop, and it is most often performed when the badness is infinite; therefore it is worth while to make a quick test for large width excess and small stretchability, before calling the badness subroutine.

⟨ Set the value of b to the badness for stretching the line, and compute the corresponding fit_class 852 ⟩≡

if (cur_active_width[3] != 0
    || cur_active_width[4] != 0
    || cur_active_width[5] != 0)
{
    b = 0;
    fit_class = DECENT_FIT; // infinite stretch
}
else if (shortfall > 7230584 && cur_active_width[2] < 1663497) {
    b = INF_BAD;
    fit_class = VERY_LOOSE_FIT;
}
else {
    b = badness(shortfall, cur_active_width[2]);
    if (b > 12) {
        if (b > 99) {
            fit_class = VERY_LOOSE_FIT;
        }
        else {
            fit_class = LOOSE_FIT;
        }
    }
    else {
        fit_class = DECENT_FIT;
    }
}

Section 853

Shrinkability is never infinite in a paragraph; we can shrink the line from r to cur_p by at most cur_active_width[6].

⟨ Set the value of b to the badness for shrinking the line, and compute the corresponding fit_class 853 ⟩≡

if (-shortfall > cur_active_width[6]) {
    b = INF_BAD + 1;
}
else {
    b = badness(-shortfall, cur_active_width[6]);
}
if (b > 12) {
    fit_class = TIGHT_FIT;
}
else {
    fit_class = DECENT_FIT;
}

Section 854

During the final pass, we dare not lose all active nodes, lest we lose touch with the line breaks already found. The code shown here makes sure that such a catastrophe does not happen, by permitting overfull boxes as a last resort. This particular part of $T E X$ was a source of several subtle bugs before the correct program logic was finally discovered; readers who seek to “improve” $T E X$ should therefore think thrice before daring to make any changes here.

⟨ Prepare to deactivate node r, and goto deactivate unless there is a reason to consider lines of text from r to cur_p 854 ⟩≡

if (final_pass
    && minimum_demerits == AWFUL_BAD
    && link(r) == LAST_ACTIVE
    && prev_r == ACTIVE)
{
    artificial_demerits = true; // set demerits zero, this break is forced
}
else if (b > threshold) {
    goto deactivate;
}
node_r_stays_active = false;

Section 855

When we get to this part of the code, the line from r to cur_p is feasible, its badness is b, and its fitness classification is fit_class. We don’t want to make an active node for this break yet, but we will compute the total demerits and record them in the minimal_demerits array, if such a break is the current champion among all ways to get to cur_p in a given line-number class and fitness class.

⟨ Record a new feasible break 855 ⟩≡

if (artificial_demerits) {
    d = 0;
}
else {
    // << Compute the demerits, |d|, from |r| to |cur_p|, 859 >>
}
#ifdef STAT
if (tracing_paragraphs > 0) {
    // << Print a symbolic description of this feasible break, 856 >>
}
#endif
d += total_demerits(r); // this is the minimum total demerits from the beginning to |cur_p| via |r|
if (d <= minimal_demerits[fit_class]) {
    minimal_demerits[fit_class] = d;
    best_place[fit_class] = break_node(r);
    best_pl_line[fit_class] = l;
    if (d < minimum_demerits) {
        minimum_demerits = d;
    }
}

Section 856

⟨ Print a symbolic description of this feasible break 856 ⟩≡

if (printed_node != cur_p) {
    // << Print the list between |printed_node| and |cur_p|, then set |printed_node = cur_p|, 857 >>
}
print_nl("@");
if (cur_p == null) {
    print_esc("par");
}
else if (type(cur_p) != GLUE_NODE) {
    if (type(cur_p) == PENALTY_NODE) {
        print_esc("penalty");
    }
    else if (type(cur_p) == DISC_NODE) {
        print_esc("discretionary");
    }
    else if (type(cur_p) == KERN_NODE) {
        print_esc("kern");
    }
    else {
        print_esc("math");
    }
}
print(" via @@");
if (break_node(r) == null) {
    print_char('0');
}
else {
    print_int(serial(break_node(r)));
}
print(" b=");
if (b > INF_BAD) {
    print_char('*');
}
else {
    print_int(b);
}
print(" p=");
print_int(pi);
print(" d=");
if (artificial_demerits) {
    print_char('*');
}
else {
    print_int(d);
}

Section 857

⟨ Print the list between printed_node and cur_p, then set printed_node = cur_p 857 ⟩≡

print_nl("");
if (cur_p == null) {
    short_display(link(printed_node));
}
else {
    save_link = link(cur_p);
    link(cur_p) = null;
    print_nl("");
    short_display(link(printed_node));
    link(cur_p) = save_link;
}
printed_node = cur_p;

Section 858

When the data for a discretionary break is being displayed, we will have printed the pre_break and post_break lists; we want to skip over the third list, so that the discretionary data will not appear twice. The following code is performed at the very end of try_break.

⟨ Update the value of printed_node for symbolic displays 858 ⟩≡

if (cur_p == printed_node
    && cur_p != null
    && type(cur_p) == DISC_NODE)
{
    t = replace_count(cur_p);
    while (t > 0) {
        decr(t);
        printed_node = link(printed_node);
    }
}

Section 859

⟨ Compute the demerits, d, from r to cur_p 859 ⟩≡

d = line_penalty + b;
if (abs(d) >= 10000) {
    d = 100000000;
}
else {
    d *= d;
}
if (pi != 0) {
    if (pi > 0) {
        d += pi*pi;
    }
    else if (pi > EJECT_PENALTY) {
        d -= pi*pi;
    }
}
if (break_type == HYPHENATED && type(r) == HYPHENATED) {
    if (cur_p != null) {
        d += double_hyphen_demerits;
    }
    else {
        d += final_hyphen_demerits;
    }
}
if (abs(fit_class - fitness(r)) > 1) {
    d += adj_demerits;
}

Section 860

When an active node disappears, we must delete an adjacent delta node if the active node was at the beginning or the end of the active list, or if it was surrounded by delta nodes. We also must preserve the property that cur_active_width represents the length of material from link(prev_r) to cur_p.

breaker.h

#define combine_two_deltas(X) mem[prev_r + (X)].sc += mem[r + (X)].sc
#define downdate_width(X)     cur_active_width[(X)] -= mem[prev_r + (X)].sc

⟨ Deactivate node r 860 ⟩≡

link(prev_r) = link(r);
free_node(r, ACTIVE_NODE_SIZE);
if (prev_r == ACTIVE) {
    // << Update the active widths, since the first active node has been deleted, 861 >>
}
else if (type(prev_r) == DELTA_NODE) {
    r = link(prev_r);
    if (r == LAST_ACTIVE) {
        do_all_six(downdate_width);
        link(prev_prev_r) = LAST_ACTIVE;
        free_node(prev_r, DELTA_NODE_SIZE);
        prev_r = prev_prev_r;
    }
    else if (type(r) == DELTA_NODE) {
        do_all_six(update_width);
        do_all_six(combine_two_deltas);
        link(prev_r) = link(r);
        free_node(r, DELTA_NODE_SIZE);
    }
}

Section 861

The following code uses the fact that type(LAST_ACTIVE) $\neq =$ DELTA_NODE. If the active list has just become empty, we do not need to update the active_width array, since it will be initialized when an active node is next inserted.

breaker.h

#define update_active(X) active_width[(X)] += mem[r + (X)].sc

⟨ Update the active widths, since the first active node has been deleted 861 ⟩≡

r = link(ACTIVE);
if (type(r) == DELTA_NODE) {
    do_all_six(update_active);
    do_all_six(copy_to_cur_active);
    link(ACTIVE) = link(r);
    free_node(r, DELTA_NODE_SIZE);
}

TeX in C