Go to the first, previous, next, last section, table of contents.

How to use this program

The general format of the program call is one of:

recode [option]... [charset]
recode [option]... [before]:[after] [file]...

The second form is the common case. Each file file will be read assuming it is coded with charset before, it will be recoded over itself so to use the charset after. If there is no such file, the program rather acts as a filter and recode standard input to standard output.

The available options are:

-C

--copyright

Given this option, all other parameters and options are ignored. The program prints briefly the Copyright and copying conditions. See the file `COPYING' in the distribution for full statement of the Copyright and copying conditions.

-a

--auto-check

In this special mode, recode ignore arguments and most options. It diagnostics itself by analysing connectivity of the various charsets, reporting on standard output, then it exits without recoding any file. For each possible pair of different charsets, it prints on standard output how many single steps are needed for achieving the recoding and how many can be saved by step merging. If a recoding cannot be done, the word `UNACHIEVABLE' is printed instead. However, this special line is completely suppressed if option -x specified some charset to ignore. The option -hname affects the resulting output, because there are more merging rules when this option is in effect. Other options affect the result: -d, -g and, notably, -s. There was a time, in GNU recode development, when this option was reasonnably interesting. With the greater number of handled charsets, it became very slow, while generating a great deal of output. It can be made slightly more practical with -x., which effectively disable most RFC 1345 charsets from the report.

-c

--colons

With texte Easy French conventions, use the column : instead of the double-quote " for marking diaeresis. See section ASCII with easy French conventions.

-d

--diacritics

While converting to or from latex charset, limit conversion to diacritics only. This is particularily useful when people write what would be valid TeX or LaTeX files, if only they were using TeX macros for applying diacritics instead of using the diacriticized characters directly from the underlying character set. While converting to latex charset, this option assumes that all special characters to TeX or LaTeX are properly escaped already; backslashes are also transmitted litterally. While converting the other way, this option prevents all attempts at recognizing TeX or LaTeX escaped representation of single characters of the other charset. See section ASCII with LaTeX codes.

-f

--force

This option will is necessary for a file to be transformed irreversibly, regardless of the fact a file is recoded over itself or produced on standard output. Beware that in this recode version, this option is only recognized, but otherwise ignored: if it is found that the recoding is not fully reversible, the file replacement is still unconditionnaly done. Even if GNU recode tries hard at keeping the recodings reversible, it cannot make any promise! In particular, consider:

Some transformations are known to be fully reversible for all inputs: recode seeks for them (also see option -s). This is not true for all transformations, however.
Usually, reversibility depends on file contents and cannot be told beforehand. Further, reversibility is never absolute accross successive versions of the program. Even correcting a small bug in a mapping could induce slight discrepancies later: please keep only reasonnable expectations about reverse recodings.
Reversibility is easily lost by merging. This is best explained through an example. If you reversibly recode a file from charset `A' to charset `B', then you reversibly recode the result from charset `B' to charset `C', you cannot expect to recover the original file by merely recoding from charset `C' directly to charset `A'. You will instead have to recode from charset `C' back to charset `B', and only then from charset `B' to charset `A'.
Faulty files create a particular problem. Consider an example, recoding from ibmpc to latin1. End of lines are represented as `\r\n' is ibmpc and as `\n' in latin1. There is no way by which a faulty ibmpc file containing a `\n' not preceeded by `\r' be translated into a latin1 file, and then back.
There is another difficulty arising from code equivalences. For example, in a latex charset file, the string `\^\i{}' could be recoded back and forth though another charset and become `\^{\i}'. Even if the resulting file is equivalent to the original one, it is not identical.

-g

--graphics

This option is only meaningful while getting out of the ibmpc charset. In this charset, characters 176 to 223 are used for constructing rulers and boxes, using simple or double horizontal or vertical lines. This option forces the automatic selection of ASCII characters for approximating these rulers and boxes, at cost of making the transformation irreversible.

-h[name]

--header[=name]

Instead of recoding files, recode writes a C source file on standard output and exits. This source is meant to be included in a regular C program: its purpose is to declare and initialize an array, named name, which represents the requested recoding. If name is not specified, then it defaults to before_to_after, where before is the starting charset and after is the goal charset. Even if recode tries its best, this option does not always succeed in producing the requested C table. It will however, provided the recoding can be internally represented by only one step after the optimization phase, and if this merged step conveys a one-to-one or a one-to-many explicit table. But this is all fairly technical. Better try and see! Beware that other options might affect the produced C tables, these are: -d, -g and, particularily, -s.

-i

--sequence=files

When the recoding requires a combination of two or more elementary recoding steps, this option forces many passes over the data, using intermediate files between passes. This is the default behaviour when files are recoded over themselves. If this option is selected in filter mode, that is, when the program reads standard input and writes standard output, it might take longer for programs further down the pipe chain to start receiving some recoded data.

-l[format]

--list[=format]

This option asks for information about all charsets, or about one particular charset. No file will be recoded. If there is no non-option arguments, recode ignores the format value of the option, it writes a sorted list of charset names on standard output, one per line. When a charset name have aliases or synonyms, they follow the true charset name on its line, presented in lexicographical order from left to right. This list is over one hundred lines. It is best used with grep, as in:

recode -l | grep greek

There might be one non-option argument, in which case it is interpreted as a charset name, possibly abbreviated to any non ambiguous prefix. This particular usage of the -l option is obeyed only for charsets having an RFC 1345 style internal description. Even if most charsets have this property, some do not, then option -l cannot be used to detail these particular charsets. For knowing if a particular charset can be listed this way, you should merely try and see if this works. The format value of the option can be any of:

decimal: This format asks for the production on standard output of a concise tabular display of the charset, in which character code values are expressed in decimal.
octal: This format uses octal instead of decimal in the concise tabular display of the charset.
hexadecimal: This format uses hexadecimal instead of decimal in the concise tabular display of the charset.
full: This format requests an extensive display of the charset on standard output, using one line per character showing its decimal, hexadecimal and octal code values, and also a descriptive comment which is indeed the 10646 character name.

When option -l is used together with a charset argument, the format defaults to decimal.

-o

--sequence=popen

When the recoding requires a combination of two or more elementary recoding steps, this option forces the creation of a chain of program instances initiated through the popen(3) library call, all operating in parallel. In filter mode, at cost of some overhead, recoded data will be available soon after the program starts, even if many elementary recoding steps are required. If, at installation time, the popen(3) call is said to be unavailable, selecting option -o is equivalent to selecting option -i.

-p

--sequence=pipe

When the recoding requires a combination of two or more elementary recoding steps, this option forces the program to fork itself into a few copies interconnected with pipes, using the pipe(2) system call. All copies of the program operate in parallel. This method is similar to the method used through option -o, but is slightly more efficient. This is the default behaviour in filter mode. If this option is used when files are recoded over themselves, this should save some disk space, at cost of more system overhead. If, at installation time, the pipe(2) call is said to be unavailable, selecting option -p is equivalent to selecting option -o. If both pipe(2) and popen(3) are unavailable, selecting option -p is equivalent to selecting option -i.

-s

--strict

By using this option, the user requests that recode be very strict while recoding a file, merely loosing in the transformation any character which is not explicitely mapped from a charset to another. This option renders the recoding less likely reversible, so it also implies option -f. When this option is not used, recode automatically tries to fill mappings with inventend correspondances, making them fully reversible in many instances. This filling is not made at random: the algorithm tries to stick to the identity mapping and, when not possible, prefer small permutation cycles. This means that, by default, recode may sometimes produce funny characters, however these are quite helpful when one changes his/her mind and wants to revert to the prior recoding.

-t

--touch

The touch option is meaningful only when files are recoded over themselves. Without it, the timestamps associated with files are preserved, to reflect the fact that changing the code of a file does not really alter its informational contents. When the user wants the recoded files to be timestamped at the recoding time, this option inhibits the automatic protection of the timestamps.

-v

--verbose

Before doing any recoding, the program will first print on `stderr' the list of all intermediate charsets planned for recoding, starting with the before charset and ending with the after charset. It also prints an indication of the recoding quality, as one of the word `reversible', `one to one', `one to many', `many to one' or `many to many'. This information will appear once or twice. It is shown a second time only when the optimization and step merging phase succeeds in creating a new single step. This option also has a second effect. The program will print on `stderr' one message per file recoded, so to let the user informed of the progress of its command. An easy way to know beforehand the sequence or quality of a recoding is by using the command such as:

recode -v before:after < /dev/null

using the fact that, so far in recode, an empty input file produces an empty output file.

-x=charset

--ignore=charset

This option tells the program to ignore any recoding path through the specified charset, so disabling any single step using this charset as a start or end point. This may be used when the user wants to force recode in using an alternate recoding path. charset may be abbreviated to any unambiguous prefix. For convenience, the value `.' is an alias for `RFC 1345', so the option -x. effectively disables all RFC 1345 tables at once.

--help

The program merely prints a page of help on standard output, and exits without doing any recoding.

--version

The program merely prints its version numbers on standard output, and exits without doing anything else.

The before:after argument specifies the start charset and the goal charset. The allowable values for before or after are described in the remainder of this document. Charsets may have predefined alternate names, or aliases, which are equally acceptable.

In the before:after argument only, a backslash may be used to quote the next character of a charset name. This might be useful for preventing a colon to be mistakenly interpreted as the separator between before and after. Rather, the colon could be omitted, because while recognizing a charset name or alias, GNU recode ignores all characters besides letters and digits. There is also no distinction between upper and lower case. Charset names or aliases may always be abbreviated to any unambiguous prefix.

One or both of the before or after keywords may be omitted, but the colon which separates them cannot. An omitted keyword implies the usual or default code in usage on the system where this program is installed. Usually, this default code is latin1 for UNIX systems or ibmpc for MS-DOS machines.

Go to the first, previous, next, last section, table of contents.