These programs use various data files to specify font encodings, auxliary information for a font, and other things. Some of these data files are distributed in the directory `data'; others must be constructed on a font-by-font basis.
If the environment variable FONTUTIL_LIB
is set, data files are
looked up along the path it specifies, using the same algorithm as is
used for font searching (see section Font searching). Otherwise, the
default path is set in the top-level Makefile.
The following sections (in other chapters of the manual) also describe file formats:
For the sake of brevity, we do not spell out every abbreviation (typically of file format names) in the manual every time we use it. This section collects and defines all the common abbreviations we use.
eexec
-encrypted Type 1
font.
Data files read by these programs are text files that share certain syntax elements:
isspace
) are ignored at the beginning of
a line.
A line can be as long as you want.
The encoding of a font specifies the mapping from character codes (an integer, typically between zero and 255) to the characters themselves; e.g., does a character with code 92 wind up printing as a backslash (as it does under the ASCII encoding) or as a double left quote (as it does under the most common TeX font encoding)? Put another way, the encoding is the arrangement of the characters in the font.
It is sad but true that no single encoding has been widely adopted, even for basic text fonts. (Text fonts and, say, math fonts or symbol fonts will clearly have different encodings.) Every typesetting program and/or font source seems to come up with a new encoding; GNU is no exception (see below). Therefore, when you decide on the encoding for the fonts you create, you should choose whatever is most convenient for the typesetting programs you intend to run it with. (Decent typesetting systems would make it trivial to set font encodings; unfortunately, almost nothing is decent in that regard!)
The encoding file format we invented is a font-format-independent
representation of an encoding. Encoding files are "data files" which
have the basic syntax elements described above (see section Common file syntax). They are usually named with the extension .enc
.
The first nonblank non-comment line in an encoding file is a string to put into TFM files as the "coding scheme" to describe the encoding; some common coding schemes are `TeX text', `TeX math symbol', `Adobe standard'. Case is irrelevant; that is, any programs which use the coding scheme should pay no attention to its case.
Thereafter, each nonblank non-comment line defines the character for the corresponding code: the first such line defines the character with code zero, the next with code one, and so on.
Each character consists of a name, optionally followed by ligature information. (All fonts using the same encoding should have the same ligatures, it seems to us.)
The character name in an encoding file is an arbitrary sequence of
nonblank characters (except it can't include a %
, since that
starts a comment). Conventionally, it consists of only lowercase
letters, except where an uppercase letter is actually involved. (For
example, eacute
is a lowercase e
with an acute accent;
Eacute
is an uppercase E
with an acute accent.
If a character code has no equivalent character in the font, i.e., the
font table has a "blank spot", you should use the name .notdef
for that code. This is the only name you can usefully give more than
once. If any other name is used more than once, the results are
undefined.
To avoid unnecessary proliferation of character names, you should use names from existing `.enc' files where possible. All the `.enc' files we have created are distributed in the `data' directory.
The ligature information for a character in an encoding file is optional. More than one ligature specification may be given. Each specification looks like:
lig second-char =: lig-char
This means that a ligature character lig-char should be present in the font for the current character (the one being defined on this line of the encoding file) followed by second-char. You give second-char and lig-char as character codes (see section Specifying character codes). For example, in most text encodings (which involve Latin characters), some variation on the following line will be present:
f lig f =: 013 lig i =: 014 lig l =: 015
This will produce a ligature in the font such that when a typesetting program sees the two character sequence `ff' in the input, it replaces those two characters in the output with the single character at position octal 13 (presumably the `fi' ligature) of the font; when it sees `fi', the character at position octal 14 is output; when it sees `fl', the character at position octal 15 is output.
Metafont version 2 allows a more general ligature scheme; if there is a demand for it, it wouldn't be hard to add.
When we started making fonts for the GNU project, we had to decide on some font encoding. We hoped to use an existing one, but none that we found seemed suitable: the TeX font encodings, including the "Cork encoding" described in TUGboat 11#4, lacked many standard PostScript characters; conversely, the standard PostScript encodings lacked useful TeX characters. Since we knew that Ghostscript and TeX would be the two main applications using the fonts, we thought it unacceptable to favor one at the expense of the other.
Therefore, we invented two new encodings. The first one, "GNU Latin text" (distributed in `data/gnulatin.enc'), is based on ISO Latin 1, and is close to a superset of both the basic TeX text encoding and the Adobe standard text encoding. We felt it was best to use ISO Latin 1 as the foundation, since some existing systems actually use ISO Latin 1 instead of ASCII. We also left the first eight positions open, so particular fonts could add more ligatures or other unusual characters.
The second, "GNU Latin text complement" (distributed in `data/gnulcomp.enc'), includes the remaining pre-accented characters from the Cork encoding, the PostScript expert encoding, swash characters, small caps, etc.
When a program reads a TFM file, it's given an arbitrary string (at best) for the coding scheme. To be useful, it needs to find the corresponding encoding file. We couldn't think of any way to name our `.enc' files that would allow the filename to be guessed automatically. Therefore, we invented another data file which maps the TFM coding scheme strings to our `.enc' filenames.
This file is distributed as `data/encoding.map'. See section Common file syntax, for a description of the common syntax elements.
Each nonblank non-comment line in `encoding.map' has two entries: the first word (contiguous nonblank characters) is the `.enc' filename; the rest of the line, after ignoring whitespace, is the string in the TFM file. This should be the same string that appears on the first line of the `.enc' file (see section Encoding files).
Programs should ignore case when using the coding scheme string.
Here is the coding scheme map file we distribute:
adobestd Adobe standard ascii ASCII dvips dvips dvips TeX text + adobestandardencoding gnulatin GNU Latin text gnulcomp GNU Latin text complement psymbol PostScript Symbol texlatin Extended TeX Latin textext TeX text zdingbat Zapf Dingbats