This charset is available in recode
under the name ascii
.
In fact, it's true name is ANSI_X3.4-1968
as per RFC 1345,
accepted aliases being ANSI_X3.4-1986
, ASCII
,
IBM367
, ISO646-US
, ISO_646.irv:1991
,
US-ASCII
, cp367
, iso-ir-6
and us
. The
shortest way of specifying it in recode
is us
.
This documentation used to include ASCII tables. They have been removed
since recode
can now recreate these (and a lot of others) easily:
recode -lf ascii for commented ASCII recode -ld ascii for concise decimal table recode -lo ascii for concise octal table recode -lh ascii for concise hexadecimal table
This charset is available in recode
under the name latin1
.
In fact, it's true name is ISO_8859-1:1987
as per RFC 1345,
accepted aliases being CP819
, IBM819
, ISO-8859-1
,
ISO_8859-1
, iso-ir-100
, l1
and latin1
. The
shortest way of specifying it in recode
is l1
.
This charset corresponds to the ISO Latin Alphabet 1. It is an eight-bit code which coincides with ASCII for the lower half.
This documentation used to include Latin-1 tables. They have been
removed since recode
can now recreate these (and a lot of others)
easily:
recode -lf latin1 for commented ISO Latin-1 recode -ld latin1 for concise decimal table recode -lo latin1 for concise octal table recode -lh latin1 for concise hexadecimal table
The following from `lasko@video.dec.com' (Tim Lasko), with no date.
ISO Latin-1, or more completely ISO Latin Alphabet No 1, is now an international standard as of February 1987 (IS 8859, Part 1). For those American USEnet'rs that care, the 8-bit ASCII standard, which is essentially the same code, is going through the final administrative processes prior to publication.
ISO Latin-1 (IS 8859/1) is actually one of an entire family of eight-bit one-byte character sets, all having ASCII on the left hand side, and with varying repertoires on the right hand side:
Pt 1. Latin Alphabet No 1 (caters to Western Europe - now approved) Pt 2. Latin Alphabet No 2 (caters to Eastern Europe - now approved) Pt 3. Latin Alphabet No 3 (caters to SE Europe + others - in draft ballot) Pt 4. Latin Alphabet No 4 (caters to Northern Europe - in draft ballot) Pt 5. Latin-Cyrillic alphabet (right half all Cyrillic - processing currently suspended pending USSR input) Pt 6. Latin-Arabic alphabet (right half all Arabic - now approved) Pt 7. Latin-Greek alphabet (right half Greek + symbols - in draft ballot) Pt 8. Latin-Hebrew alphabet (right half Hebrew + symbols - proposed)
This charset is available in recode
under the name ascii-bs
.
The file is straight ASCII, seven bits only. According to the definition of ASCII: diacritics are applied by a sequence of three characters: the letter, one BS, the diacritic mark. We deviate slightly from this by exchanging the diacritic mark and the letter so, on a screen device, the diacritic will disappear and let the letter alone. At recognition time, both methods are acceptable.
The French quotes are coded by the sequences: < BS " or "
BS < for the opening quote and > BS " or "
BS > for the closing quote. This artifical convention was
inherited in straight ascii-bs
from habits around bangbang
entry, and is not well known. But we decided to stick to it so that
ascii-bs
charset will not loose French quotes.
The ascii-bs
charset is independant of ascii
, and
different. The following examples demonstrate this, knowing at advance
that `!2' is the bangbang
way of representing an e
with an acute accent. Compare:
% echo \!2 | recode -v bang:ascii | od -bc bangbang -> iso-8859-1-1987 -> rfc1345 -> ansi-x3.4-1968 (many to one) bangbang -> iso-8859-1-1987 -> ansi-x3.4-1968 (many to one) 0000000 351 012 351 \n 0000002
with:
% echo \!2 | recode -v bang:ascii-bs | od -bc bangbang -> iso-8859-1-1987 -> ascii-bs (many to many) 0000000 047 010 145 012 ' \b e \n 0000004
In the first case, the e with an acute accent is merely
transmitted by the latin1:ascii
mapping, not having a special
recoding rule for it. In the latin1:ascii-bs
case, the acute
accent is applied over the e with a backspace: diacriticized
characters have special rules. For the ascii-bs
charset,
reversibility is still possible, but there might be difficult cases.
This charset is available in recode
under the name flat
.
This code is ASCII expunged of all diacritics and underlines, as long as they are applied using three character sequences, with BS in the middle. Also, despite slightly unrelated, each control character is represented by a sequence of two or three graphic characters. The newline character, however, keeps its functionnality and is not represented.
Note that charset flat
is a terminal charset. We can convert
to flat
, but not from it.