Section 900: Post-hyphenation
If a hyphen may be inserted between hc[j] and hc[j + 1], the hyphenation procedure will set hyf[j] to some small odd number. But before we look at ’s hyphenation procedure, which is independent of the rest of the line-breaking algorithm, let us consider what we will do with the hyphens it finds, since it is better to work on this part of the program before forgetting what ha and hb, etc., are all about.
⟨ Global variables 13 ⟩+≡
quarterword hyf[65]; // odd values indicate discretionary hyphens
pointer init_list; // list of punctuation characters preceding the word
bool init_lig; // does |init_list| represent a ligature?
bool init_lft; // if so, did the ligature involve a left boundary?
Section 901
⟨ Local variables for hyphenation 901 ⟩≡
int i, j, l; // indices into |hc| or |hu|
pointer q, r, s; // temporary registers for list manipulation
halfword bchar; // boundary character of hyphenated word, or |NON_CHAR|
Section 902
will never insert a hyphen that has fewer than \lefthyphenmin
letters before it or fewer than \righthyphenmin
after it; hence, a short word has comparatively little chance of being hyphenated.
If no hyphens have been found, we can save time by not having to make any changes to the paragraph.
⟨ If no hyphens were found, return 902 ⟩≡
for(j = l_hyf; j <= hn - r_hyf; j++) {
if (odd(hyf[j])) {
goto found1;
}
}
return;
found1:
Section 903
If hyphens are in fact going to be inserted, first deletes the subsequence of nodes between ha and hb. An attempt is made to preserve the effect that implicit boundary characters and punctuation marks had on ligatures inside the hyphenated word, by storing a left boundary or preceding character in hu[0] and by storing a possible right boundary in bchar. We set j ← 0 if hu[0] is to be part of the reconstruction; otherwise j ← 1. The variable s will point to the tail of the current hlist, and q will point to the node following hb, so that things can be hooked up after we reconstitute the hyphenated word.
⟨ Replace nodes ha..hb by a sequence of nodes that includes the discretionary hyphens 903 ⟩≡
q = link(hb);
link(hb) = null;
r = link(ha);
link(ha) = null;
bchar = hyf_bchar;
if (is_char_node(ha)) {
if (font(ha) != hf) {
goto found2;
}
else {
init_list = ha;
init_lig = false;
hu[0] = character(ha);
}
}
else if (type(ha) == LIGATURE_NODE) {
if (font(lig_char(ha)) != hf) {
goto found2;
}
else {
init_list = lig_ptr(ha);
init_lig = true;
init_lft = (subtype(ha) > 1);
hu[0] = character(lig_char(ha));
if (init_list == null && init_lft) {
hu[0] = 256;
init_lig = false;
} // in this case a ligature will be reconstructed from scratch
free_node(ha, SMALL_NODE_SIZE);
}
}
else {
// no punctuation found; look for left boundary
if (!is_char_node(r) && type(r) == LIGATURE_NODE && subtype(r) > 1) {
goto found2;
}
j = 1;
s = ha;
init_list = null;
goto common_ending;
}
s = cur_p; // we have |cur_p != ha| because |type(cur_p) = GLUE_NODE|
while (link(s) != ha) {
s = link(s);
}
j = 0;
goto common_ending;
found2:
s = ha;
j = 0;
hu[0] = 256;
init_lig = false;
init_list = null;
common_ending:
flush_node_list(r);
// << Reconstitute nodes for the hyphenated word, inserting discretionary hyphens, 913 >>
flush_list(init_list);
Section 904
We must now face the fact that the battle is not over, even though the hyphens have been found: The process of reconstituting a word can be nontrivial because ligatures might change when a hyphen is present. The TeXbook discusses the difficulties of the word “difficult”, and the discretionary material surrounding a hyphen can be considerably more complex than that. Suppose is a word in a font for which the only ligatures are , , , and . If this word permits hyphenation between and , the two patterns with and without hyphenation are and . Thus the insertion of a hyphen might cause effects to ripple arbitrarily far into the rest of the word. A further complication arises if additional hyphens appear together with such rippling, e.g., if the word in the example just given could also be hyphenated between and ; avoids this by simply ignoring the additional hyphens in such weird cases.
Still further complications arise in the presence of ligatures that do not delete the original characters.
When punctuation precedes the word being hyphenated, ’s method is not perfect under all possible scenarios, because punctuation marks and letters can propagate information back and forth.
For example, suppose the original pre-hyphenation pair changes to via a |=:
ligature, which changes to via a =:|
ligature;
if and , the reconstitution procedure isn’t smart enough to obtain again.
In such cases the font designer should include a ligature that goes from to .
Section 905
The processing is facilitated by a subroutine called reconstitute. Given a string of characters , there is a smallest index m j such that the “translation” of by ligatures and kerning has the form followed by the translation of , where is some nonempty sequence of character, ligature, and kern nodes. We call a “cut prefix” of . For example, if , and if the font contains ‘fl’ as a ligature and a kern between ‘fl’ and ‘y’, then m = 2, t = 2, and will be a ligature node for ‘fl’ followed by an appropriate kern node . In the most common case, forms no ligature with and we simply have m = j, . If m n we can repeat the procedure on until the entire translation has been found.
The reconstitute function returns the integer and puts the nodes into a linked list starting at link(HOLD_HEAD), getting the input from the hu array. If , we consider to be an implicit left boundary character; in this case j must be strictly less than n. There is a parameter bchar, which is either 256 or an implicit right boundary character assumed to be present just following . (The value hu[n + 1] is never explicitly examined, but the algorithm imagines that bchar is there.)
If there exists an index k in the range j k m such that hyf[k] is odd and such that the result of reconstitute would have been different if had been hchar, then reconstitute sets hyphen_passed to the smallest such k. Otherwise it sets hyphen_passed to zero.
A special convention is used in the case j = 0: Then we assume that the translation of hu[0] appears in a special list of charnodes starting at init_list; moreover, if init_lig is true, then hu[0] will be a ligature character, involving a left boundary if init_lft is true. This facility is provided for cases when a hyphenated word is preceded by punctuation (like single or double quotes) that might affect the translation of the beginning of the word.
⟨ Global variables 13 ⟩+≡
small_number hyphen_passed; // first hyphen in a ligature, if any
Section 906
⟨ Declare the function called reconstitute 906 ⟩≡
small_number reconstitute(small_number j, small_number n, halfword bchar, halfword hchar) {
pointer p; // temporary register for list manipulation
pointer t; // a node being appended to
memory_word q; // character information or a lig/kern instruction
halfword cur_rh; // hyphen character for ligature testing
halfword test_char; // hyphen or other character for ligature testing
scaled w; // amount of kerning
font_index k; // position of current lig/kern instruction
hyphen_passed = 0;
t = HOLD_HEAD;
w = 0;
link(HOLD_HEAD) = null;
// at this point |ligature_present = lft_hit = rt_hit = false|
// << Set up data structures with the cursor following position |j|, 908 >>
continue_lbl:
// << If there's a ligature or kern at the cursor position, update the data structures, possibly advancing |j|; continue until the cursor moves, 909 >>
// << Append a ligature and/or kern to the translation; |goto continue| if the stack of inserted ligatures is nonempty, 910 >>
return j;
}
Section 907
The reconstitution procedure shares many of the global data structures by which has processed the words before they were hyphenated. There is an implied “cursor” between characters cur_l and cur_r; these characters will be tested for possible ligature activity. If ligature_present then cur_l is a ligature character formed from the original characters following cur_q in the current translation list. There is a “ligature stack” between the cursor and character j + 1, consisting of pseudo-ligature nodes linked together by their link fields. This stack is normally empty unless a ligature command has created a new character that will need to be processed later. A pseudo-ligature is a special node having a character field that represents a potential ligature and a lig_ptr field that points to a CHAR_NODE or is null. We have
⟨ Global variables 13 ⟩+≡
halfword cur_l, cur_r; // characters before and after the cursor
pointer cur_q; // where a ligature should be detached
pointer lig_stack; // unfinished business to the right of the cursor
bool ligature_present; // should a ligature node be made for |cur_l|?
bool lft_hit, rt_hit; // did we hit a ligature with a boundary character?
Section 908
#define append_charnode_to_t(X) \
link(t) = get_avail(); \
t = link(t); \
font(t) = hf; \
character(t) = (X)
#define set_cur_r \
do { \
if (j < n) { \
cur_r = hu[j + 1]; \
} \
else { \
cur_r = bchar; \
} \
if (odd(hyf[j])) { \
cur_rh = hchar; \
} \
else { \
cur_rh = NON_CHAR; \
} \
} while (0)
⟨ Set up data structures with the cursor following position j 908 ⟩≡
cur_l = hu[j];
cur_q = t;
if (j == 0) {
ligature_present = init_lig;
p = init_list;
if (ligature_present) {
lft_hit = init_lft;
}
while (p > null) {
append_charnode_to_t(character(p));
p = link(p);
}
}
else if (cur_l < NON_CHAR) {
append_charnode_to_t(cur_l);
}
lig_stack = null;
set_cur_r;
Section 909
We may want to look at the lig/kern program twice, once for a hyphen and once for a normal letter. (The hyphen might appear after the letter in the program, so we’d better not try to look for both at once.)
⟨ If there’s a ligature or kern at the cursor position, update the data structures, possibly advancing j; continue until the cursor moves 909 ⟩≡
if (cur_l == NON_CHAR) {
k = bchar_label[hf];
if (k == NON_ADDRESS) {
goto done;
}
else {
q = font_info[k];
}
}
else {
q = char_info(hf, cur_l);
if (char_tag(q) != LIG_TAG) {
goto done;
}
k = lig_kern_start(hf, q);
q = font_info[k];
if (skip_byte(q) > STOP_FLAG) {
k = lig_kern_restart(hf, q);
q = font_info[k];
}
} // now |k| is the starting address of the lig/kern program
if (cur_rh < NON_CHAR) {
test_char = cur_rh;
}
else {
test_char = cur_r;
}
while(true) {
if (next_char(q) == test_char && skip_byte(q) <= STOP_FLAG) {
if (cur_rh < NON_CHAR) {
hyphen_passed = j;
hchar = NON_CHAR;
cur_rh = NON_CHAR;
goto continue_lbl;
}
else {
if (hchar < NON_CHAR && odd(hyf[j])) {
hyphen_passed = j;
hchar = NON_CHAR;
}
if (op_byte(q) < KERN_FLAG) {
// << Carry out a ligature replacement, updating the cursor structure and possibly advancing |j|; |goto continue| if the cursor doesn't advance, otherwise |goto done|, 911 >>
}
w = char_kern(hf, q);
goto done; // this kern will be inserted below
}
}
if (skip_byte(q) >= STOP_FLAG) {
if (cur_rh == NON_CHAR) {
goto done;
}
else {
cur_rh = NON_CHAR;
goto continue_lbl;
}
}
k += skip_byte(q) + 1;
q = font_info[k];
}
done:
Section 910
#define wrap_lig(X) \
do { \
if (ligature_present) { \
p = new_ligature(hf, cur_l, link(cur_q)); \
if (lft_hit) { \
subtype(p) = 2; \
lft_hit = false; \
} \
if ((X) && lig_stack == null) { \
incr(subtype(p)); \
rt_hit = false; \
} \
link(cur_q) = p; \
t = p; \
ligature_present = false; \
} \
} while (0)
#define pop_lig_stack \
do { \
if (lig_ptr(lig_stack) > null) { \
/* this is a charnode for |hu[j + 1]| */ \
link(t) = lig_ptr(lig_stack); \
t = link(t); \
incr(j); \
} \
p = lig_stack; \
lig_stack = link(p); \
free_node(p, SMALL_NODE_SIZE); \
if (lig_stack == null) { \
set_cur_r; \
} \
else { \
cur_r = character(lig_stack); \
} \
} while (0)
// if |lig_stack| isn't |null| we have |cur_rh = NON_CHAR|
⟨ Append a ligature and/or kern to the translation; goto continue if the stack of inserted ligatures is nonempty 910 ⟩≡
wrap_lig(rt_hit);
if (w != 0) {
link(t) = new_kern(w);
t = link(t);
w = 0;
}
if (lig_stack > null) {
cur_q = t;
cur_l = character(lig_stack);
ligature_present = true;
pop_lig_stack;
goto continue_lbl;
}
Section 911
⟨ Carry out a ligature replacement, updating the cursor structure and possibly advancing j; goto continue if the cursor doesn’t advance, otherwise goto done 911 ⟩≡
if (cur_l == NON_CHAR) {
lft_hit = true;
}
if (j == n && lig_stack == null) {
rt_hit = true;
}
check_interrupt; // allow a way out in case there's an infinite ligature loop
switch (op_byte(q)) {
case 1:
case 5:
cur_l = rem_byte(q); // =:|, =:|>
ligature_present = true;
break;
case 2:
case 6:
cur_r = rem_byte(q); // |=:, |=:>
if (lig_stack > null) {
character(lig_stack) = cur_r;
}
else {
lig_stack = new_lig_item(cur_r);
if (j == n) {
bchar = NON_CHAR;
}
else {
p = get_avail();
lig_ptr(lig_stack) = p;
character(p) = hu[j + 1];
font(p) = hf;
}
}
break;
case 3:
cur_r = rem_byte(q); // |=:|}
p = lig_stack;
lig_stack = new_lig_item(cur_r);
link(lig_stack) = p;
break;
case 7:
case 11:
wrap_lig(false); // |=:|>, |=:|>>
cur_q = t;
cur_l = rem_byte(q);
ligature_present = true;
break;
default:
cur_l = rem_byte(q);
ligature_present = true; // =:
if (lig_stack > null) {
pop_lig_stack;
}
else if (j == n) {
goto done;
}
else {
append_charnode_to_t(cur_r);
incr(j);
set_cur_r;
}
}
if (op_byte(q) > 4 && op_byte(q) != 7) {
goto done;
}
goto continue_lbl;
Section 912
Okay, we’re ready to insert the potential hyphenations that were found. When the following program is executed, we want to append the word hu[1 .. hn] after node ha, and node q should be appended to the result. During this process, the variable i will be a temporary index into hu; the variable j will be an index to our current position in hu; the variable l will be the counterpart of j, in a discretionary branch; the variable r will point to new nodes being created; and we need a few new local variables:
⟨ Local variables for hyphenation 901 ⟩+≡
pointer major_tail, minor_tail; // the end of lists in the main and discretionary branches being reconstructed
ASCII_code c = 0; // character temporarily replaced by a hyphen
int c_loc; // where that character came from
int r_count; // replacement count for discretionary
pointer hyf_node; // the hyphen, if it exists
Section 913
When the following code is performed, hyf[0] and hyf[hn] will be zero.
⟨ Reconstitute nodes for the hyphenated word, inserting discretionary hyphens 913 ⟩≡
do {
l = j;
j = reconstitute(j, hn, bchar, hyf_char) + 1;
if (hyphen_passed == 0) {
link(s) = link(HOLD_HEAD);
while (link(s) > null) {
s = link(s);
}
if (odd(hyf[j - 1])) {
l = j;
hyphen_passed = j - 1;
link(HOLD_HEAD) = null;
}
}
if (hyphen_passed > 0) {
// << Create and append a discretionary node as an alternative to the unhyphenated word, and continue to develop both branches until they become equivalent, 914 >>
}
} while (j <= hn);
link(s) = q;
Section 914
In this repeat loop we will insert another discretionary if hyf[j − 1] is odd, when both branches of the previous discretionary end at position j − 1. Strictly speaking, we aren’t justified in doing this, because we don’t know that a hyphen after j − 1 is truly independent of those branches. But in almost all applications we would rather not lose a potentially valuable hyphenation point. (Consider the word ‘difficult’, where the letter ‘c’ is in position j.)
#define advance_major_tail major_tail = link(major_tail); incr(r_count)
⟨ Create and append a discretionary node as an alternative to the unhyphenated word, and continue to develop both branches until they become equivalent 914 ⟩≡
do {
r = get_node(SMALL_NODE_SIZE);
link(r) = link(HOLD_HEAD);
type(r) = DISC_NODE;
major_tail = r;
r_count = 0;
while (link(major_tail) > null) {
advance_major_tail;
}
i = hyphen_passed;
hyf[i] = 0;
// << Put the characters |hu[l..i]| and a hyphen into |pre_break(r)|, 915 >>
// << Put the characters |hu[i + 1..]| into |post_break(r)|, appending to this list and to |major_tail| until synchronization has been achieved, 916 >>
// << Move pointer |s| to the end of the current list, and set |replace_count(r)| appropriately, 918 >>
hyphen_passed = j - 1;
link(HOLD_HEAD) = null;
} while (odd(hyf[j - 1]));
Section 915
The new hyphen might combine with the previous character via ligature or kern. At this point we have l − 1 i j and i hn.
⟨ Put the characters hu[l..i] and a hyphen into pre_break(r) 915 ⟩≡
minor_tail = null;
pre_break(r) = null;
hyf_node = new_character(hf, hyf_char);
if (hyf_node != null) {
incr(i);
c = hu[i];
hu[i] = hyf_char;
free_avail(hyf_node);
}
while (l <= i) {
l = reconstitute(l, i, font_bchar[hf], NON_CHAR) + 1;
if (link(HOLD_HEAD) > null) {
if (minor_tail == null) {
pre_break(r) = link(HOLD_HEAD);
}
else {
link(minor_tail) = link(HOLD_HEAD);
}
minor_tail = link(HOLD_HEAD);
while (link(minor_tail) > null) {
minor_tail = link(minor_tail);
}
}
}
if (hyf_node != null) {
hu[i] = c; // restore the character in the hyphen position
l = i;
decr(i);
}
Section 916
The synchronization algorithm begins with l = i + 1 j.
⟨ Put the characters hu[i + 1..] into post_break(r), appending to this list and to major_tail until synchronization has been achieved 916 ⟩≡
minor_tail = null;
post_break(r) = null;
c_loc = 0;
if (bchar_label[hf] != NON_ADDRESS) {
// put left boundary at beginning of new line
decr(l);
c = hu[l];
c_loc = l;
hu[l] = 256;
}
while (l < j) {
do {
l = reconstitute(l, hn, bchar, NON_CHAR) + 1;
if (c_loc > 0) {
hu[c_loc] = c;
c_loc = 0;
}
if (link(HOLD_HEAD) > null) {
if (minor_tail == null) {
post_break(r) = link(HOLD_HEAD);
}
else {
link(minor_tail) = link(HOLD_HEAD);
}
minor_tail = link(HOLD_HEAD);
while (link(minor_tail) > null) {
minor_tail = link(minor_tail);
}
}
} while (l < j);
while (l > j) {
// << Append characters of |hu[j..]| to |major_tail|, advancing |j|, 917 >>
}
}
Section 917
⟨ Append characters of hu[j..] to major_tail, advancing j 917 ⟩≡
j = reconstitute(j, hn, bchar, NON_CHAR) + 1;
link(major_tail) = link(HOLD_HEAD);
while (link(major_tail) > null) {
advance_major_tail;
}
Section 918
Ligature insertion can cause a word to grow exponentially in size. Therefore we must test the size of r_count here, even though the hyphenated text was at most 63 characters long.
⟨ Move pointer s to the end of the current list, and set replace_count(r) appropriately 918 ⟩≡
if (r_count > 127) {
// we have to forget the discretionary hyphen
link(s) = link(r);
link(r) = null;
flush_node_list(r);
}
else {
link(s) = r;
replace_count(r) = r_count;
}
s = major_tail;