The general format of the program call is one of:
recode [option]... [charset] recode [option]... [before]:[after] [file]...
The second form is the common case. Each file file will be read assuming it is coded with charset before, it will be recoded over itself so to use the charset after. If there is no such file, the program rather acts as a filter and recode standard input to standard output.
The available options are:
-C
--copyright
-a
--auto-check
recode
ignore arguments and most options.
It diagnostics itself by analysing connectivity of the various charsets,
reporting on standard output, then it exits without recoding any file.
For each possible pair of different charsets, it prints on standard
output how many single steps are needed for achieving the recoding and
how many can be saved by step merging. If a recoding cannot be done,
the word `UNACHIEVABLE' is printed instead. However, this special
line is completely suppressed if option -x
specified some charset
to ignore.
The option -hname
affects the resulting output, because
there are more merging rules when this option is in effect. Other
options affect the result: -d
, -g
and, notably, -s
.
There was a time, in GNU recode
development, when this option was
reasonnably interesting. With the greater number of handled charsets,
it became very slow, while generating a great deal of output. It
can be made slightly more practical with -x.
, which effectively
disable most RFC 1345 charsets from the report.
-c
--colons
texte
Easy French conventions, use the column :
instead of the double-quote " for marking diaeresis.
See section ASCII with easy French conventions.
-d
--diacritics
latex
charset, limit conversion to
diacritics only. This is particularily useful when people write what
would be valid TeX or LaTeX files, if only they were using TeX
macros for applying diacritics instead of using the diacriticized
characters directly from the underlying character set.
While converting to latex
charset, this option assumes that all
special characters to TeX or LaTeX are properly escaped already;
backslashes are also transmitted litterally. While converting the other
way, this option prevents all attempts at recognizing TeX or LaTeX
escaped representation of single characters of the other charset.
See section ASCII with LaTeX codes.
-f
--force
recode
version,
this option is only recognized, but otherwise ignored: if it is
found that the recoding is not fully reversible, the file replacement is
still unconditionnaly done.
Even if GNU recode
tries hard at keeping the recodings
reversible, it cannot make any promise! In particular, consider:
recode
seeks for them (also see option -s
). This is not
true for all transformations, however.
ibmpc
to latin1
. End of lines are represented as
`\r\n' is ibmpc
and as `\n' in latin1. There is no way
by which a faulty ibmpc
file containing a `\n' not preceeded
by `\r' be translated into a latin1
file, and then back.
latex
charset file, the string `\^\i{}'
could be recoded back and forth though another charset and become
`\^{\i}'. Even if the resulting file is equivalent to the
original one, it is not identical.
-g
--graphics
ibmpc
charset. In this charset, characters 176 to 223 are used
for constructing rulers and boxes, using simple or double horizontal or
vertical lines. This option forces the automatic selection of ASCII
characters for approximating these rulers and boxes, at cost of making
the transformation irreversible.
-h[name]
--header[=name]
recode
writes a C source file on
standard output and exits. This source is meant to be included in a
regular C program: its purpose is to declare and initialize an array,
named name, which represents the requested recoding. If
name is not specified, then it defaults to
before_to_after
, where before is the starting
charset and after is the goal charset.
Even if recode
tries its best, this option does not always
succeed in producing the requested C table. It will however, provided
the recoding can be internally represented by only one step after the
optimization phase, and if this merged step conveys a one-to-one or a
one-to-many explicit table. But this is all fairly technical. Better
try and see!
Beware that other options might affect the produced C tables, these are:
-d
, -g
and, particularily, -s
.
-i
--sequence=files
-l[format]
--list[=format]
recode
ignores the
format value of the option, it writes a sorted list of charset
names on standard output, one per line. When a charset name have
aliases or synonyms, they follow the true charset name on its line,
presented in lexicographical order from left to right. This list is
over one hundred lines. It is best used with grep
, as in:
recode -l | grep greekThere might be one non-option argument, in which case it is interpreted as a charset name, possibly abbreviated to any non ambiguous prefix. This particular usage of the
-l
option is obeyed only for
charsets having an RFC 1345 style internal description. Even if most
charsets have this property, some do not, then option -l
cannot
be used to detail these particular charsets. For knowing if a
particular charset can be listed this way, you should merely try and see
if this works. The format value of the option can be any of:
decimal
octal
hexadecimal
full
-l
is used together with a charset argument,
the format defaults to decimal
.
-o
--sequence=popen
popen(3)
library call, all
operating in parallel. In filter mode, at cost of some overhead,
recoded data will be available soon after the program starts, even if
many elementary recoding steps are required.
If, at installation time, the popen(3)
call is said to be
unavailable, selecting option -o
is equivalent to selecting
option -i
.
-p
--sequence=pipe
pipe(2)
system call.
All copies of the program operate in parallel. This method is similar
to the method used through option -o
, but is slightly more
efficient. This is the default behaviour in filter mode. If this
option is used when files are recoded over themselves, this should save
some disk space, at cost of more system overhead.
If, at installation time, the pipe(2)
call is said to be
unavailable, selecting option -p
is equivalent to selecting
option -o
. If both pipe(2)
and popen(3)
are
unavailable, selecting option -p
is equivalent to selecting
option -i
.
-s
--strict
recode
be very
strict while recoding a file, merely loosing in the transformation any
character which is not explicitely mapped from a charset to another.
This option renders the recoding less likely reversible, so it also
implies option -f
.
When this option is not used, recode
automatically tries to fill
mappings with inventend correspondances, making them fully reversible in
many instances. This filling is not made at random: the algorithm tries
to stick to the identity mapping and, when not possible, prefer small
permutation cycles. This means that, by default, recode
may
sometimes produce funny characters, however these are quite
helpful when one changes his/her mind and wants to revert to the prior
recoding.
-t
--touch
-v
--verbose
recode -v before:after < /dev/nullusing the fact that, so far in
recode
, an empty input file
produces an empty output file.
-x=charset
--ignore=charset
recode
in using an alternate recoding path.
charset may be abbreviated to any unambiguous prefix. For
convenience, the value `.' is an alias for `RFC 1345', so the
option -x.
effectively disables all RFC 1345 tables at
once.
--help
--version
The before:after argument specifies the start charset and the goal charset. The allowable values for before or after are described in the remainder of this document. Charsets may have predefined alternate names, or aliases, which are equally acceptable.
In the before:after argument only, a backslash may be used
to quote the next character of a charset name. This might be useful for
preventing a colon to be mistakenly interpreted as the separator between
before and after. Rather, the colon could be omitted,
because while recognizing a charset name or alias, GNU recode
ignores all characters besides letters and digits. There is also no
distinction between upper and lower case. Charset names or aliases may
always be abbreviated to any unambiguous prefix.
One or both of the before or after keywords may be omitted,
but the colon which separates them cannot. An omitted keyword implies
the usual or default code in usage on the system where this program is
installed. Usually, this default code is latin1
for UNIX systems
or ibmpc
for MS-DOS machines.