Reviewing hyphenation

From Wiki
Revision as of 22:16, 4 February 2009 by Huttarl (talk | contribs) (created the page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Problem

We have a situation where hyphenation is an issue, due to a 2-column layout where the columns are not very wide. We've done a lot of tweaking of settings for hyphenation and interword spacing, and the result seems pretty good. In particular, there are not many cases of consecutive lines that end with hyphens, and not many cases where a hyphenation occurs over a right-hand page break. The few cases that exist, we have been fixing manually by using \hbox{...} to prevent hyphenation at the trouble spot.

But the hyphenation is by nature somewhat volatile, so whenever we change something we would like to be able to easily recheck the hyphenation. And our book is over 1200 pages, so it would be very helpful to have tools to make the checking more efficient.

Potential Solutions

PDF viewers

One tool we found was the "evince" PDF viewer in Linux, which highlights all search results at once. So you can search for "-", and it will highlight all hyphens, which makes it easier to scan the PDF visually for hyphenation problems.

Still, this approach has its limitations... our layout domain experts don't have Linux machines, and I haven't found a PDF viewer for Windows that can highlight all search results at once.

(We are still looking into Okular, which is available for Windows at http://windows.kde.org)

A ConTeXt solution

Another approach we wondered about was having TeX highlight the hyphenations... e.g. changed the background color to yellow or red, when outputting a word that's dynamically broken/hyphenated. (Rather like we have TeX output red grid lines to help with debugging layout.) I think we would also want to highlight static hyphens that occur at the end of a line, as in "Niger- Congo," because they have a similar visual impact. Possibly using a different color.

This would be an ideal solution, I think, but we don't know how to have TeX detect when a word gets dynamically hyphenated. (I made some inquiries on the NTG list to this effect. The response was that it would be not difficult to implement this in mkiv, but it could not be done in mkii. And we were not free to move to mkiv.)

Adobe Acrobat / Javascript

Another possibility is using javascript in Adobe Acrobat Pro to automatically find and highlight end-of-line (and end-of-page) hyphens. That is the approach where we had most success. The features and limitations are described below, and the javascript code is attached.

Features

  • In Acrobat Pro, load a PDF and select "Highlight Hyphens" from the Tools menu to begin the highlighting. The first part of each word that is line-broken with a hyphen is highlighted.
  • The javascript console window shows progress.
  • The console reports number of hyphens (actually, words line-broken with a hyphen) on each page.
  • The resulting highlighted PDF can be saved including the highlights.
  • The saved, highlighted PDF can be viewed with highlights using Adobe Reader (does not require Acrobat Pro).

Limitations

  • Slow. A representative test showed 0.07 pages per second (14 seconds per page!) That would mean about 5 hours for our book.
  • The resulting PDF file grows by about 25%.
  • Sometimes the highlighting function stops with an error ("Internal error" / "General error") after about 30 pages. We don't know why but maybe it could be avoided by only doing a limited number of pages at a time.

The code

The two attached javascript files are placed in the Acrobat javascripts folder, e.g. C:\Program Files\Adobe\Acrobat 9.0\Acrobat\Javascripts, and then Acrobat is restarted. add-hyphen-menu.js adds a menu item for "Highlight Hyphens..." on the Tools menu. findAndAnnot.js defines the function that finds line-broken words and highlights the first "quad" of each.