Imageto converts an image file (currently either in portable bitmap format (PBM) or GEM's IMG format) to either a bitmap font or an Encapsulated PostScript file (EPSF). An image file is simply a large bitmap.
If the output is a font, it can be constructed either by outputting a constant number of scanlines from the image as each "character" or (more usually) by extracting the "real" characters from the image.
The current selection of input formats is rather arbitrary. We implemented the IMG format because that is what our scanner outputs, and the PBM format because Ghostscript can output it (see section GSrenderfont). Other formats could easily be added.
Usually there are two prerequisites to extracting a usable font from an image file. First, looking at the image, so you can see what you've got. Second, preparing the IFI file describing the contents of the image: the character codes to output, any baseline adjustment (as for, e.g., `j'), and how many pieces each character has. Each is a separate invocation of Imageto; the first time with either the `-strips' or `-epsf' option, the second time with neither.
In the second step, Imageto considers the input image as a series of image rows. Each image row consists of all the scanlines between a nonblank scanline and the next entirely blank scanline. (A scanline is a single horizontal row of pixels in the image.) Within each image row, Imageto looks top-to-bottom, left-to-right, for bounding boxes: closed contours, i.e., an area whose edge you can trace with a pencil without lifting it.
For example, in the following image Imageto would find two image rows, the first from scanlines 1 to scanline 7, the second consisting of only scanline 10. There are six bounding boxes in the first image row, only one in the second. (This example also shows some typical problems in scanned images: the baseline of the `m' is not aligned with those of the `i', `j', and `l'; a meaningless black line is present; the `i' and `j' overlap.)
01234567890123456789 0 1 x 2 x x x 3 x 4 x x x xxxxx 5 x x x x x x 6 x x x x 7 xx 8 9 10 xxxxxxxxxxxxxxx
Typically, the first step in extracting a font from an image is to see exactly what is in the image. (Clearly, this is unnecessary if you already know what your image file contains.)
The simplest way to get a look at the image file, if you have Ghostscript or some other suitable PostScript interpreter, is to convert the image file into an EPSF file with the `-epsf' option. Here is a possible invocation:
imageto -epsf ggmr.img
Here we read an input file `ggmr.img'; the output is `ggmr.eps'. You can then view the EPS file with
gs ggmr.eps
(presuming that `gs' invokes your PostScript interpreter).
If you don't have both a suitable PostScript interpreter and enough disk space to store the EPS file (it uses approximately twice as much disk space as the original image), the above won't work. Instead, to view the image you must make a font with the `-strips' option:
imageto -strips ggmr.img
The output of this will be `ggmrsp.1200gf' (our image having a resolution of 1200 dpi). Although the GF font cannot be conveniently viewed directly, you can use TeX and your favorite DVI processor to look at it, as follows:
fontconvert -tfm ggmrsp.1200 echo ggmrsp | tex strips
This outputs in `strips.dvi', which you can view with your favorite DVI driver. (See section Archives, for how to obtain the DVI drivers for PostScript and X we recommend.)
`strips.tex' is distributed in the `imageto' directory.
Once you can see what is in the image, the next step is to prepare the IFI file (see section IFI files) corresponding to its characters. Imageto relies completely on the IFI files to describe the image; it makes no attempt at optical character recognition, i.e., guessing what the characters are from their shapes.
You must also decide on a few more aspects of the output font, which you specify with options:
75 (K) 5/315This means that character code 75, whose name in the encoding file is `K', has its bottom row at row 5, and its top row at row 315; i.e., the character has five blank rows above the origin. This is almost certainly wrong (the letter `K' should sit on the typesetting baseline), so we would want to adjust it downwards to 0 via the individual character adjustment (see section IFI files).
The final invocation to produce the font might look something like this:
imageto -baselines=121,130,120 -designsize=26 ggmr
The output from this would be `ggmr26.1200gf'.
Your image may not be completely "clean", i.e., the scanning process may have introduced artifacts: black lines at the edge of the paper; blotches where the original had a speck of dirt or ink; broken lines where the image had a continuous line. To get a correct output font, you must correct these problems.
To remove blotches, you can simply put .notdef
in the appropriate
place in the IFI file. You can find the "appropriate place" when you
look at the output font; some character will be nothing but a (possibly
tiny) speck, and all the characters following will be in the wrong
position.
The `-print-clean-info' option might also help you to diagnose which bounding boxes are being assigned to which characters, when you are in doubt. Here is an example of its output:
[Cleaning 149x383 bitmap: checking (0,99)-(10,152) ... clearing. checking (0,203)-(35,263) ... clearing. checking (0,99)-(130,382) ... keeping. checking (113,0)-(149,37) ... keeping. 106]
The final `106' is the character code output (ASCII `j'). The size of the overall bitmap which contains the `j' is 149 pixels wide and 383 pixels high. The bitmap contained four bounding boxes, the last two of which belonged to the `j' and were kept, and the first two from the adjacent character (`i') and were erased. (As shown in the example image above, the tail of the `j' often overlaps the `i' in type specimens.)
If the image has blobs you have not removed with .notdef
, you
will see a small bounding box in this output. The numbers shown are in
"bitmap coordinates": (0,0) is the upper left-hand pixel of the
bitmap.
If a blotch appears outside of the row of characters, Imageto will consider it to be its own (very small) image row. If you are using `-baselines', you must specify an arbitrary value corresponding to the blotch, even though the bounding box in the image will be ignored. See the section above for an example.
An image font information (IFI) file is a text file which describes the contents of an image file. You yourself must create it; as we will see, the information it contains usually cannot be determined automatically.
If your image file is named `foo.img' (or `foo.pbm'), it is customary to name the corresponding IFI file `foo.ifi'. That is what Imageto looks for by default. If you name it something else, you must specify the name with the `-ifi-file' option.
Imageto does not look for an IFI file if either the `-strips' or `-epsf' options were specified.
Each nonblank non-comment line in the IFI file represents a a sequence of bounding boxes in the image, and a corresponding character in the output font. See section Common file syntax, for a description of syntax elements common to all data files processed by these programs, including comments.
Each line has one to five entries, separated by spaces and/or tabs. If a line contains fewer than five entries, suitable defaults (as described below) are taken for the missing trailing entries. (It is impossible to supply a value for entry #3, say, without also supplying values for entries #1 and #2.)
Here is the meaning of each entry, in order:
.notdef
, or if the character name is not specified in the
encoding, Imageto just throws away the bounding boxes. See section Encoding files, for general information on encoding files.
-2
.
Here is a possible IFI file for the image in section Imageto usage. We throw away the black line that is the second image row. (Imagine that it is a scanner artifact.)
% IFI file for example image. i 0 2 j 0 2 l m 1 .notdef % Ignore the black line at the bottom.
This section describes the options that Imageto accepts. See section Command-line options, for general option syntax.
The main input filename (see section The main input file) is called image-name below.