Section 862: Breaking paragraphs into lines, continued

So far we have gotten a little way into the line_break routine, having covered its important try_break subroutine. Now let’s consider the rest of the process.

The main loop of line_break traverses the given hlist, starting at link(TEMP_HEAD), and calls try_break at each legal breakpoint. A variable called auto_breaking is set to true except within math formulas, since glue nodes are not legal breakpoints when they appear in formulas.

The current node of interest in the hlist is pointed to by cur_p. Another variable, prev_p, is usually one step behind cur_p, but the real meaning of prev_p is this: If type(cur_p) = GLUE_NODE then cur_p is a legal breakpoint if and only if auto_breaking is true and prev_p does not point to a glue node, penalty node, explicit kern node, or math node.

The following declarations provide for a few other local variables that are used in special calculations.

⟨ Local variables for line breaking 862 ⟩≡

bool auto_breaking;      // is node |cur_p| outside a formula?
pointer prev_p;          // helps to determine when glue nodes are breakpoints
pointer q, r, s, prev_s; // miscellaneous nodes of temporary interest
internal_font_number f;  // used when calculating character widths

Section 863

The ‘loop’ in the following code is performed at most thrice per call of line_break, since it is actually a pass over the entire paragraph.

⟨ Find optimal breakpoints 863 ⟩≡

threshold = pretolerance;
if (threshold >= 0) {
#ifdef STAT
    if (tracing_paragraphs > 0) {
        begin_diagnostic();
        print_nl("@firstpass");
    } 
#endif
    second_pass = false;
    final_pass = false;
}
else {
    threshold = tolerance;
    second_pass = true;
    final_pass = (emergency_stretch <= 0);
 
#ifdef STAT
    if (tracing_paragraphs > 0) {
        begin_diagnostic();
    }
#endif
}
while(true) {
    if (threshold > INF_BAD) {
        threshold = INF_BAD;
    }
    if (second_pass) {
        // << Initialize for hyphenating a paragraph, 891 >>
    }
    // << Create an active breakpoint representing the beginning of the paragraph, 864 >>
    cur_p = link(TEMP_HEAD);
    auto_breaking = true;
    prev_p = cur_p; // glue at beginning is not a legal breakpoint
    while (cur_p != null && link(ACTIVE) != LAST_ACTIVE) {
        // << Call |try_break| if |cur_p| is a legal breakpoint; on the second pass, also try to hyphenate the next word, if |cur_p| is a glue node; then advance |cur_p| to the next node of the paragraph that could possibly be a legal breakpoint, 866 >>
    }
    if (cur_p == null) {
        // << Try the final line break at the end of the paragraph, and |goto done| if the desired breakpoints have been found, 873 >>
    }
    // << Clean up the memory by removing the break nodes, 865 >>
    if (!second_pass) {
#ifdef STAT
        if (tracing_paragraphs > 0) {
            print_nl("@secondpass");
        }
#endif
        threshold = tolerance;
        second_pass = true;
        final_pass = (emergency_stretch <= 0);
    } // if at first you don't succeed, \dots
    else {
#ifdef STAT
        if (tracing_paragraphs > 0) {
            print_nl("@emergencypass");
        }
#endif
        background[2] += emergency_stretch;
        final_pass = true;
    }
}
done:
#ifdef STAT
if (tracing_paragraphs > 0) {
    end_diagnostic(true);
    normalize_selector();
}
#endif

Section 864

The active node that represents the starting point does not need a corresponding passive node.

breaker.h
#define store_background(X) active_width[(X)] = background[(X)]

⟨ Create an active breakpoint representing the beginning of the paragraph 864 ⟩≡

q = get_node(ACTIVE_NODE_SIZE);
type(q) = UNHYPHENATED;
fitness(q) = DECENT_FIT;
link(q) = LAST_ACTIVE;
break_node(q) = null;
line_number(q) = prev_graf + 1;
total_demerits(q) = 0;
link(ACTIVE) = q;
do_all_six(store_background);
passive = null;
printed_node = TEMP_HEAD;
pass_number = 0;
font_in_short_display = NULL_FONT;

Section 865

⟨ Clean up the memory by removing the break nodes 865 ⟩≡

q = link(ACTIVE);
while (q != LAST_ACTIVE) {
    cur_p = link(q);
    if (type(q) == DELTA_NODE) {
        free_node(q, DELTA_NODE_SIZE);
    }
    else {
        free_node(q, ACTIVE_NODE_SIZE);
    }
    q = cur_p;
}
q = passive;
while (q != null) {
    cur_p = link(q);
    free_node(q, PASSIVE_NODE_SIZE);
    q = cur_p;
}

Section 866

Here is the main switch in the line_break routine, where legal breaks are determined. As we move through the hlist, we need to keep the active_width array up to date, so that the badness of individual lines is readily calculated by try_break. It is convenient to use the short name act_width for the component of active width that represents real width as opposed to glue.

breaker.h
#define act_width active_width[1] // length from first active node to current node
#define kern_break                             \
    do {                                       \
        if (!is_char_node(link(cur_p))         \
            && auto_breaking                   \
            && type(link(cur_p)) == GLUE_NODE) \
        {                                      \
            try_break(0, UNHYPHENATED);        \
        }                                      \
        act_width += width(cur_p);             \
    } while (0)

⟨ Call try_break if cur_p is a legal breakpoint; on the second pass, also try to hyphenate the next word, if cur_p is a glue node; then advance cur_p to the next node of the paragraph that could possibly be a legal breakpoint 866 ⟩≡

if (is_char_node(cur_p)) {
    // << Advance |cur_p| to the node following the present string of characters, 867 >>
}
switch (type(cur_p)) {
case HLIST_NODE:
case VLIST_NODE:
case RULE_NODE:
    act_width += width(cur_p);
    break;

case WHATSIT_NODE:
    // << Advance past a whatsit node in the |line_break| loop, 1362 >>
    break;

case GLUE_NODE:
    // << If node |cur_p| is a legal breakpoint, call |try_break|; then update the active widths by including the glue in |glue_ptr(cur_p)|, 868 >>
    if (second_pass && auto_breaking) {
        // << Try to hyphenate the following word, 894 >>
    }
    break;

case KERN_NODE:
    if (subtype(cur_p) == EXPLICIT) {
        kern_break;
    }
    else {
        act_width += width(cur_p);
    }
    break;

case LIGATURE_NODE:
    f = font(lig_char(cur_p));
    act_width += char_width(f, char_info(f, character(lig_char(cur_p))));
    break;

case DISC_NODE:
    // << Try to break after a discretionary fragment, then |goto done5|, 869 >>

case MATH_NODE:
    auto_breaking = (subtype(cur_p) == AFTER);
    kern_break;
    break;

case PENALTY_NODE:
    try_break(penalty(cur_p), UNHYPHENATED);
    break;

case MARK_NODE:
case INS_NODE:
case ADJUST_NODE:
    do_nothing;
    break;

default:
    confusion("paragraph");
}
prev_p = cur_p;
cur_p = link(cur_p);
done5:

Section 867

The code that passes over the characters of words in a paragraph is part of ’s inner loop, so it has been streamlined for speed. We use the fact that ‘\parfillskip’ glue appears at the end of each paragraph; it is therefore unnecessary to check if link(cur_p) = null when cur_p is a character node.

⟨ Advance cur_p to the node following the present string of characters 867 ⟩≡

prev_p = cur_p;
do {
    f = font(cur_p);
    act_width += char_width(f, char_info(f, character(cur_p)));
    cur_p = link(cur_p);
} while (is_char_node(cur_p));

Section 868

When node cur_p is a glue node, we look at prev_p to see whether or not a breakpoint is legal at cur_p, as explained above.

⟨ If node cur_p is a legal breakpoint, call try_break; then update the active widths by including the glue in glue_ptr(cur_p) 868 ⟩≡

if (auto_breaking) {
    if (is_char_node(prev_p)
        || precedes_break(prev_p)
        || (type(prev_p) == KERN_NODE && subtype(prev_p) != EXPLICIT))
    {
        try_break(0, UNHYPHENATED);
    }
}
check_shrinkage(glue_ptr(cur_p));
q = glue_ptr(cur_p);
act_width += width(q);
active_width[2 + stretch_order(q)] += stretch(q);
active_width[6] += shrink(q);

Section 869

The following code knows that discretionary texts contain only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes.

⟨ Try to break after a discretionary fragment, then goto done5 869 ⟩≡

s = pre_break(cur_p);
disc_width = 0;
if (s == null) {
    try_break(ex_hyphen_penalty, HYPHENATED);
}
else {
    do {
        // << Add the width of node |s| to |disc_width|, 870 >>
        s = link(s);
    } while (s != null);
    act_width += disc_width;
    try_break(hyphen_penalty, HYPHENATED);
    act_width -= disc_width;
}
r = replace_count(cur_p);
s = link(cur_p);
while (r > 0) {
    // << Add the width of node |s| to |act_width|, 871 >>
    decr(r);
    s = link(s);
}
prev_p = cur_p;
cur_p = s;
goto done5;

Section 870

⟨ Add the width of node s to disc_width 870 ⟩≡

if (is_char_node(s)) {
    f = font(s);
    disc_width += char_width(f, char_info(f, character(s)));
}
else {
    switch (type(s)) {
    case LIGATURE_NODE:
        f = font(lig_char(s));
        disc_width += char_width(f, char_info(f, character(lig_char(s)))); 
        break;
    
    case HLIST_NODE:
    case VLIST_NODE:
    case RULE_NODE:
    case KERN_NODE:
        disc_width += width(s);
        break;
    
    default:
        confusion("disc3");
    }
}

Section 871

⟨ Add the width of node s to act_width 871 ⟩≡

if (is_char_node(s)) {
    f = font(s);
    act_width += char_width(f, char_info(f, character(s)));
}
else {
    switch (type(s)) {
    case LIGATURE_NODE:
        f = font(lig_char(s));
        act_width += char_width(f, char_info(f, character(lig_char(s))));
        break;
    
    case HLIST_NODE:
    case VLIST_NODE:
    case RULE_NODE:
    case KERN_NODE:
        act_width += width(s);
        break;
    
    default:
        confusion("disc4");
    }
}

Section 872

The forced line break at the paragraph’s end will reduce the list of breakpoints so that all active nodes represent breaks at cur_p = null. On the first pass, we insist on finding an active node that has the correct “looseness”. On the final pass, there will be at least one active node, and we will match the desired looseness as well as we can.

The global variable best_bet will be set to the active node for the best way to break the paragraph, and a few other variables are used to help determine what is best.

⟨ Global variables 13 ⟩+≡

pointer best_bet;     // use this passive node and its predecessors
int fewest_demerits;  // the demerits associated with |best_bet|
halfword best_line;   // line number following the last line of the new paragraph
int actual_looseness; // the difference between |line_number(best_bet)| and the optimum |best_line|
int line_diff;        // the difference between the current line number and the optimum |best_line|

Section 873

⟨ Try the final line break at the end of the paragraph, and goto done if the desired breakpoints have been found 873 ⟩≡

try_break(EJECT_PENALTY, HYPHENATED);
if (link(ACTIVE) != LAST_ACTIVE) {
    // << Find an active node with fewest demerits, 874 >>
    if (looseness == 0) {
        goto done;
    }
    // << Find the best active node for the desired looseness, 875 >>
    if (actual_looseness == looseness || final_pass) {
        goto done;
    }
}

Section 874

⟨ Find an active node with fewest demerits 874 ⟩≡

r = link(ACTIVE);
fewest_demerits = AWFUL_BAD;
do {
    if (type(r) != DELTA_NODE && total_demerits(r) < fewest_demerits) {
        fewest_demerits = total_demerits(r);
        best_bet = r;
    }
    r = link(r);
} while (r != LAST_ACTIVE);
best_line = line_number(best_bet);

Section 875

The adjustment for a desired looseness is a slightly more complicated version of the loop just considered. Note that if a paragraph is broken into segments by displayed equations, each segment will be subject to the looseness calculation, independently of the other segments.

⟨ Find the best active node for the desired looseness 875 ⟩≡

r = link(ACTIVE);
actual_looseness = 0;
do {
    if (type(r) != DELTA_NODE) {
        line_diff = line_number(r) - best_line;
        if ((line_diff < actual_looseness && looseness <= line_diff)
            || (line_diff > actual_looseness && looseness >= line_diff))
        {
            best_bet = r;
            actual_looseness = line_diff;
            fewest_demerits = total_demerits(r);
        }
        else if (line_diff == actual_looseness && total_demerits(r) < fewest_demerits) {
            best_bet = r;
            fewest_demerits = total_demerits(r);
        }
    }
    r = link(r);
} while (r != LAST_ACTIVE);
best_line = line_number(best_bet);

Section 876

Once the best sequence of breakpoints has been found (hurray), we call on the procedure post_line_break to finish the remainder of the work. (By introducing this subprocedure, we are able to keep line_break from getting extremely long.)

⟨ Break the paragraph at the chosen breakpoints, justify the resulting lines to the correct widths, and append them to the current vertical list 876 ⟩≡

post_line_break(final_widow_penalty);

Section 877

The total number of lines that will be set by post_line_break is best_line − prev_graf − 1. The last breakpoint is specified by break_node(best_bet), and this passive node points to the other breakpoints via the prev_break links. The finishing-up phase starts by linking the relevant passive nodes in forward order, changing prev_break to next_break. (The next_break fields actually reside in the same memory space as the prev_break fields did, but we give them a new name because of their new significance.) Then the lines are justified, one by one.

breaker.h
#define next_break prev_break // new name for |prev_break| after links are reversed

⟨ Declare subprocedures for line_break 826 ⟩+≡

void post_line_break(int final_widow_penalty) {
    pointer q, r, s;      // temporary registers for list manipulation
    bool disc_break;      // was the current break at a discretionary node?
    bool post_disc_break; // and did it have a nonempty post-break part?
    scaled cur_width;     // width of line number |cur_line|
    scaled cur_indent;    // left margin of line number |cur_line|
    quarterword t;        // used for replacement counts in discretionary nodes
    int pen;              // use when calculating penalties between lines
    halfword cur_line;    // the current line number being justified
    
    // << Reverse the links of the relevant passive nodes, setting |cur_p| to the first breakpoint, 878 >>
    cur_line = prev_graf + 1;
    do {
        // << Justify the line ending at breakpoint |cur_p|, and append it to the current vertical list, together with associated penalties and other insertions, 880 >>
        incr(cur_line);
        cur_p = next_break(cur_p);
        if (cur_p != null && !post_disc_break) {
            // << Prune unwanted nodes at the beginning of the next line, 879 >>
        }
    } while (cur_p != null);
    if (cur_line != best_line || link(TEMP_HEAD) != null) {
        confusion("line breaking");
    }
    prev_graf = best_line - 1;
}

Section 878

The job of reversing links in a list is conveniently regarded as the job of taking items off one stack and putting them on another. In this case we take them off a stack pointed to by q and having prev_break fields; we put them on a stack pointed to by cur_p and having next_break fields. Node r is the passive node being moved from stack to stack.

⟨ Reverse the links of the relevant passive nodes, setting cur_p to the first breakpoint 878 ⟩≡

q = break_node(best_bet);
cur_p = null;
do {
    r = q;
    q = prev_break(q);
    next_break(r) = cur_p;
    cur_p = r;
} while (q != null);

Section 879

Glue and penalty and kern and math nodes are deleted at the beginning of a line, except in the anomalous case that the node to be deleted is actually one of the chosen breakpoints. Otherwise the pruning done here is designed to match the lookahead computation in try_break, where the break_width values are computed for non-discretionary breakpoints.

⟨ Prune unwanted nodes at the beginning of the next line 879 ⟩≡

r = TEMP_HEAD;
while(true) {
    q = link(r);
    if (q == cur_break(cur_p) // |cur_break(cur_p)| is the next breakpoint
        // now |q| cannot be |null|   
        || is_char_node(q)
        || non_discardable(q)
        || (type(q) == KERN_NODE && subtype(q) != EXPLICIT))
    {
        break; // goto done1
    }    
    r = q; // now |type(q) = GLUE_NODE|, |KERN_NODE|, |MATH_NODE|, or |PENALTY_NODE|
}
//done1:
if (r != TEMP_HEAD) {
    link(r) = null;
    flush_node_list(link(TEMP_HEAD));
    link(TEMP_HEAD) = q;
}

Section 880

The current line to be justified appears in a horizontal list starting at link(TEMP_HEAD) and ending at cur_break(cur_p). If cur_break(cur_p) is a glue node, we reset the glue to equal the right_skip glue; otherwise we append the right_skip glue at the right. If cur_break(cur_p) is a discretionary node, we modify the list so that the discretionary break is compulsory, and we set disc_break to true. We also append the left_skip glue at the left of the line, unless it is zero.

⟨ Justify the line ending at breakpoint cur_p, and append it to the current vertical list, together with associated penalties and other insertions 880 ⟩≡

// << Modify the end of the line to reflect the nature of the break and to include \rightskip; also set the proper value of |disc_break|, 881 >>

// << Put the \leftskip glue at the left and detach this line, 887 >>

// << Call the packaging subroutine, setting |just_box| to the justified box, 889 >>

// << Append the new box to the current vertical list, followed by the list of special nodes taken out of the box by the packager, 888 >>

// << Append a penalty node, if a nonzero penalty is appropriate, 890 >>

Section 881

At the end of the following code, q will point to the final node on the list about to be justified.

⟨ Modify the end of the line to reflect the nature of the break and to include \rightskip; also set the proper value of disc_break 881 ⟩≡

q = cur_break(cur_p);
disc_break = false;
post_disc_break = false;
if (q != null) {
    // |q| cannot be a |CHAR_NODE|
    if (type(q) == GLUE_NODE) {
        delete_glue_ref(glue_ptr(q));
        glue_ptr(q) = right_skip;
        subtype(q) = RIGHT_SKIP_CODE + 1;
        add_glue_ref(right_skip);
        goto done;
    }
    else {
        if (type(q) == DISC_NODE) {
            // << Change discretionary to compulsory and set |disc_break = true|, 882 >>
        }
        else if (type(q) == MATH_NODE || type(q) == KERN_NODE) {
            width(q) = 0;
        }
    }
}
else {
    q = TEMP_HEAD;
    while (link(q) != null) {
        q = link(q);
    }
}
// << Put the \rightskip glue after node |q|, 886 >>
done:

Section 882

⟨ Change discretionary to compulsory and set disc_break = true 882 ⟩≡

t = replace_count(q);
// << Destroy the |t| nodes following |q|, and make |r| point to the following node, 883 >>
if (post_break(q) != null) {
    // << Transplant the post-break list, 884 >>
}
if (pre_break(q) != null) {
    // << Transplant the pre-break list, 885 >>
}
link(q) = r;
disc_break = true;

Section 883

⟨ Destroy the t nodes following q, and make r point to the following node 883 ⟩≡

if (t == 0) {
    r = link(q);
}
else {
    r = q;
    while (t > 1) {
        r = link(r);
        decr(t);
    }
    s = link(r);
    r = link(s);
    link(s) = null;
    flush_node_list(link(q));
    replace_count(q) = 0;
}

Section 884

We move the post-break list from inside node q to the main list by reattaching it just before the present node r, then resetting r.

⟨ Transplant the post-break list 884 ⟩≡

s = post_break(q);
while (link(s) != null) {
    s = link(s);
}
link(s) = r;
r = post_break(q);
post_break(q) = null;
post_disc_break = true;

Section 885

We move the pre-break list from inside node q to the main list by reattaching it just after the present node q, then resetting q.

⟨ Transplant the pre-break list 885 ⟩≡

s = pre_break(q);
link(q) = s;
while (link(s) != null) {
    s = link(s);
}
pre_break(q) = null;
q = s;

Section 886

⟨ Put the \rightskip glue after node q 886 ⟩≡

r = new_param_glue(RIGHT_SKIP_CODE);
link(r) = link(q);
link(q) = r;
q = r;

Section 887

The following code begins with q at the end of the list to be justified. It ends with q at the beginning of that list, and with link(TEMP_HEAD) pointing to the remainder of the paragraph, if any.

⟨ Put the \leftskip glue at the left and detach this line 887 ⟩≡

r = link(q);
link(q) = null;
q = link(TEMP_HEAD);
link(TEMP_HEAD) = r;
if (left_skip != ZERO_GLUE) {
    r = new_param_glue(LEFT_SKIP_CODE);
    link(r) = q;
    q = r;
}

Section 888

⟨ Append the new box to the current vertical list, followed by the list of special nodes taken out of the box by the packager 888 ⟩≡

append_to_vlist(just_box);
if (ADJUST_HEAD != adjust_tail) {
    link(tail) = link(ADJUST_HEAD);
    tail = adjust_tail;
}
adjust_tail = null;

Section 889

Now q points to the hlist that represents the current line of the paragraph. We need to compute the appropriate line width, pack the line into a box of this size, and shift the box by the appropriate amount of indentation.

⟨ Call the packaging subroutine, setting just_box to the justified box 889 ⟩≡

if (cur_line > last_special_line) {
    cur_width = second_width;
    cur_indent = second_indent;
}
else if (par_shape_ptr == null) {
    cur_width = first_width;
    cur_indent = first_indent;
}
else {
    cur_width = mem[par_shape_ptr + 2*cur_line].sc;
    cur_indent = mem[par_shape_ptr + 2*cur_line - 1].sc;
}
adjust_tail = ADJUST_HEAD;
just_box = hpack(q, cur_width, EXACTLY);
shift_amount(just_box) = cur_indent;

Section 890

Penalties between the lines of a paragraph come from club and widow lines, from the inter_line_penalty parameter, and from lines that end at discretionary breaks. Breaking between lines of a two-line paragraph gets both club-line and widow-line penalties. The local variable pen will be set to the sum of all relevant penalties for the current line, except that the final line is never penalized.

⟨ Append a penalty node, if a nonzero penalty is appropriate 890 ⟩≡

if (cur_line + 1 != best_line) {
    pen = inter_line_penalty;
    if (cur_line == prev_graf + 1) {
        pen += club_penalty;
    }
    if (cur_line + 2 == best_line) {
        pen += final_widow_penalty;
    }
    if (disc_break) {
        pen += broken_penalty;
    }
    if (pen != 0) {
        r = new_penalty(pen);
        link(tail) = r;
        tail = r;
    }
}