Go to the first, previous, next, last section, table of contents.

File formats

These programs use various data files to specify font encodings, auxliary information for a font, and other things. Some of these data files are distributed in the directory `data'; others must be constructed on a font-by-font basis.

If the environment variable FONTUTIL_LIB is set, data files are looked up along the path it specifies, using the same algorithm as is used for font searching (see section Font searching). Otherwise, the default path is set in the top-level Makefile.

The following sections (in other chapters of the manual) also describe file formats:

section BZR files.
section CCC files.
section CMI files.
section IFI files.

File format abbreviations: The alphabet soup of font formats.
Common file syntax: Some elements of auxiliary files are constant.
Encoding files: The character code-to-shape mapping.
Coding scheme map file: The coding scheme string-to-filename mapping.

File format abbreviations

For the sake of brevity, we do not spell out every abbreviation (typically of file format names) in the manual every time we use it. This section collects and defines all the common abbreviations we use.

BPL: The `Bezier property list' format output by BZRto and read by BPLtoBZR. This is a transliteration of the binary BZR format into human-readable (and -editable) text. See section BPL files.
BZR: The `Bezier' outline format output by Limn and read by BZRto. We invented this format ourselves. See section BZR files.
CCC: The `cookie-cutter character' (er, `composite character construction') files read by BZRto to add pre-accented and other such characters to a font. See section CCC files.
CMI: The `character metric information' files read by Charspace to add side bearings to a font. See section CMI files.
GF: The `generic font' bitmap format output by Metafont (and by most of these programs). See the sources for Metafont or one of the other TeX font utility programs (GFtoPK, etc.) for the definition.
DVI: The `device independent' format output by TeX, GFtoDVI, etc. Many "DVI driver" programs have been written to translate DVI format to something that can actually be printed or previewed. See sources for TeX or DVItype for the definition.
EPS: The `Encapsulated PostScript' format output by many programs, including Imageto (see section Viewing an image) and Fontconvert (see section Fontconvert output options). An EPS file differs from a plain PostScript file in that it contains information about the PostScript image it produces: its bounding box, for example. (This information is contained in comments, since PostScript has no good way to express such information directly.)
IFI: The `image font information' files read by Imageto when making a font from an image. See section IFI files.
GSF: The `Ghostscript font' format output by BZRto and the `bdftops' program in the Ghostscript distribution. This is nothing more than the Adobe Type 1 font format, unencrypted. The Adobe Type 1 format is defined in a book published by Adobe. (Many PostScript interpreters cannot read unencrypted Type 1 fonts, despite the fact that the definition says encryption is not required. Ghostscript can read both encrypted and unencrypted Type 1 fonts.)
IMG: The `image' format used by some GEM (a window system sometimes used under DOS) programs; specifically, by the program which drives our scanner.
MF: The `Meta-Font' programming language for designing typefaces invented by Donald Knuth. His Metafontbook is the only manual written to date (that we know of).
PBM: The `portable bitmap' format used by the PBMplus programs, Ghostscript, Imageto, etc. It was invented by Jef Poskanzer (we believe), the author of PBMplus.
PFA: The `printer font ASCII' format in which Type 1 PostScript fonts are sometimes distributed. This format uses the ASCII hexadecimal characters `0' to `9' and `a' to `f' (and/or `A' to `F') to represent an eexec-encrypted Type 1 font.
PFB: The `printer font binary' format in which Type 1 PostScript fonts are sometimes distributed. This format is most commonly used on DOS systems. (Personally, we find the existence of this format truly despicable, as one of the major advantages of PostScript is its being defined entirely in terms of plain text files (in Level 1 PostScript, anyway). Having an unportable binary font format completely defeats this.)
PK: The `packed font' bitmap format output by GFtoPK. PK format has (for all practical purposes) the same information as GF format, and does a better job of packing: typically a font in PK format will be one-half to two-thirds of the size of the same font in GF format. It was invented by Tom Rokicki as part of the TeX project. See the GFtoPK source for the definition.
PL: The `property list' format output by TFtoPL. This is a transliteration of the binary TFM format into human-readable (and -editable) text. Some of these programs output a PL file and call PLtoTF to make a TFM from it. (For technical reasons it is easier to do this than to output a TFM file directly.) See the PLtoTF source for the details.
TFM: The `TeX font metric' format output by Metafont, PLtoTF, and other programs, and read by TeX. TFM files include only character dimension information (widths, heights, depths, and italic corrections), kerns, ligatures, and font parameters; in particular, there is no information about the character shapes. See the TeX or Metafont source for the definition.

Common file syntax

Data files read by these programs are text files that share certain syntax elements:

Comments begin with a `%' character and continue to the end of the line. The content of comments is entirely ignored.
Blank lines are allowed, and ignored. Whitespace characters (as defined by the C facility isspace) are ignored at the beginning of a line.
Any character except ASCII NUL--character zero--is acceptable in data files. (We would allow NULs, too, at the expense of complicating the code, if we knew of any useful purpose for them.)

A line can be as long as you want.

Encoding files

The encoding of a font specifies the mapping from character codes (an integer, typically between zero and 255) to the characters themselves; e.g., does a character with code 92 wind up printing as a backslash (as it does under the ASCII encoding) or as a double left quote (as it does under the most common TeX font encoding)? Put another way, the encoding is the arrangement of the characters in the font.

It is sad but true that no single encoding has been widely adopted, even for basic text fonts. (Text fonts and, say, math fonts or symbol fonts will clearly have different encodings.) Every typesetting program and/or font source seems to come up with a new encoding; GNU is no exception (see below). Therefore, when you decide on the encoding for the fonts you create, you should choose whatever is most convenient for the typesetting programs you intend to run it with. (Decent typesetting systems would make it trivial to set font encodings; unfortunately, almost nothing is decent in that regard!)

The encoding file format we invented is a font-format-independent representation of an encoding. Encoding files are "data files" which have the basic syntax elements described above (see section Common file syntax). They are usually named with the extension .enc.

The first nonblank non-comment line in an encoding file is a string to put into TFM files as the "coding scheme" to describe the encoding; some common coding schemes are `TeX text', `TeX math symbol', `Adobe standard'. Case is irrelevant; that is, any programs which use the coding scheme should pay no attention to its case.

Thereafter, each nonblank non-comment line defines the character for the corresponding code: the first such line defines the character with code zero, the next with code one, and so on.

Each character consists of a name, optionally followed by ligature information. (All fonts using the same encoding should have the same ligatures, it seems to us.)

Character names: How to write character names.
Ligature definitions: How to define ligatures.
GNU encodings: Why we invented new encodings for GNU.

Character names

The character name in an encoding file is an arbitrary sequence of nonblank characters (except it can't include a %, since that starts a comment). Conventionally, it consists of only lowercase letters, except where an uppercase letter is actually involved. (For example, eacute is a lowercase e with an acute accent; Eacute is an uppercase E with an acute accent.

If a character code has no equivalent character in the font, i.e., the font table has a "blank spot", you should use the name .notdef for that code. This is the only name you can usefully give more than once. If any other name is used more than once, the results are undefined.

To avoid unnecessary proliferation of character names, you should use names from existing `.enc' files where possible. All the `.enc' files we have created are distributed in the `data' directory.

Ligature definitions

The ligature information for a character in an encoding file is optional. More than one ligature specification may be given. Each specification looks like:

lig second-char =: lig-char

This means that a ligature character lig-char should be present in the font for the current character (the one being defined on this line of the encoding file) followed by second-char. You give second-char and lig-char as character codes (see section Specifying character codes). For example, in most text encodings (which involve Latin characters), some variation on the following line will be present:

f       lig f =: 013  lig i =: 014  lig l =: 015

This will produce a ligature in the font such that when a typesetting program sees the two character sequence `ff' in the input, it replaces those two characters in the output with the single character at position octal 13 (presumably the `fi' ligature) of the font; when it sees `fi', the character at position octal 14 is output; when it sees `fl', the character at position octal 15 is output.

Metafont version 2 allows a more general ligature scheme; if there is a demand for it, it wouldn't be hard to add.

GNU encodings

When we started making fonts for the GNU project, we had to decide on some font encoding. We hoped to use an existing one, but none that we found seemed suitable: the TeX font encodings, including the "Cork encoding" described in TUGboat 11#4, lacked many standard PostScript characters; conversely, the standard PostScript encodings lacked useful TeX characters. Since we knew that Ghostscript and TeX would be the two main applications using the fonts, we thought it unacceptable to favor one at the expense of the other.

Therefore, we invented two new encodings. The first one, "GNU Latin text" (distributed in `data/gnulatin.enc'), is based on ISO Latin 1, and is close to a superset of both the basic TeX text encoding and the Adobe standard text encoding. We felt it was best to use ISO Latin 1 as the foundation, since some existing systems actually use ISO Latin 1 instead of ASCII. We also left the first eight positions open, so particular fonts could add more ligatures or other unusual characters.

The second, "GNU Latin text complement" (distributed in `data/gnulcomp.enc'), includes the remaining pre-accented characters from the Cork encoding, the PostScript expert encoding, swash characters, small caps, etc.

Coding scheme map file

When a program reads a TFM file, it's given an arbitrary string (at best) for the coding scheme. To be useful, it needs to find the corresponding encoding file. We couldn't think of any way to name our `.enc' files that would allow the filename to be guessed automatically. Therefore, we invented another data file which maps the TFM coding scheme strings to our `.enc' filenames.

This file is distributed as `data/encoding.map'. See section Common file syntax, for a description of the common syntax elements.

Each nonblank non-comment line in `encoding.map' has two entries: the first word (contiguous nonblank characters) is the `.enc' filename; the rest of the line, after ignoring whitespace, is the string in the TFM file. This should be the same string that appears on the first line of the `.enc' file (see section Encoding files).

Programs should ignore case when using the coding scheme string.

Here is the coding scheme map file we distribute:

adobestd 	Adobe standard
ascii		ASCII
dvips		dvips
dvips		TeX text + adobestandardencoding
gnulatin	GNU Latin text
gnulcomp 	GNU Latin text complement
psymbol 	PostScript Symbol
texlatin	Extended TeX Latin
textext		TeX text
zdingbat	Zapf Dingbats

Go to the first, previous, next, last section, table of contents.