Previous: introduction Up: ../karrtn.html Next: support-criteria
BACKGROUND
==========
Before describing the proposed primitives, some
background information is useful. FORTRAN has never offered
satisfactory support of character data. Indeed, some
compilers extant until the mid-1960's did not even have
Hollerith data items or A FORMAT descriptors, or LOGICAL
variables, for that matter. When limited character support
became widely available in FORTRAN, it was restricted to
Hollerith string constants of the form 8HCHEMISTRY, together
with the A FORMAT item. Hollerith constants were permitted
by the 1966 ANSI FORTRAN Standard to occur only in DATA and
FORMAT statements, and as subroutine arguments in CALL
statements (but not in FUNCTION references, although no
compiler that I am aware of enforces this restriction). No
CHARACTER data type was introduced, and characters were
forced to masquerade in the guise of other data types.
Coding Hollerith strings is somewhat tedious and
error-prone, because of the necessity of counting
characters. Consequently, many manufacturers permitted
character constants to be surrounded by delimiter
characters, for example, "CHEMISTRY", but again, no general
agreement was reached about what the delimiter characters
ought to be. Single and double quotes are most common, but
asterisks and not-equal signs have also been used. When
string delimiters are used, the question arises as to how
the delimiter character itself is to be represented in a
string constant. Usually, the doubled-delimiter approach,
"O""MALLEY" for the string O"MALLEY, has been adhered to,
although CDC's use of the asterisk as a string delimiter
simply prohibited its appearance as a string character. As
a result of these variations, only the Hollerith string can
be relied upon for portability, and automated means of
converting between the different string conventions in
FORTRAN source programs are available at some installations.
The 1966 implementation of support for character
data is just about the worst possible. The Hollerith form
is certainly undesirable. Even worse is the convention for
internal storage of character strings. These must always be
stored left-justified in a computer word, and right-padded
with blanks if the number of characters specified does not
fill an integral number of machine words. The number of
characters which fit in a word ranges from 1 to 10 on
existing computers [BEEB79], and the left-justification
means that even if one arranges to store only one character
per word for word-length independence, the character will be
occupying the most-significant bit positions and probably
the sign bit as well. This means that even comparison of
characters for equality can result in an arithmetic overflow
condition on those machines where comparisons are
implemented by subtraction. It also means that accessing
the numerical value of a character cannot be done portably,
for division by a power of two to effect a right shift of
the bit pattern will fail if the sign position is occupied
by a 1-bit.
Another problem is that depending upon the FORTRAN
type of the variable in which characters are stored,
different results may be obtained on different machines. For
example, character storage in LOGICAL variables is
impossible on those machines which implement LOGICAL scalars
and arrays as bit strings, and on most others, the 1966
Standard's prohibition of the use of the relational
operators .EQ., .NE., .LT., etc. between LOGICAL variables
would prevent character comparisons. Floating-point types
are also unsuitable, because mantissa normalization which
may occur in assignments or in expression evaluation usually
will scramble the bits, destroying the characters stored in
the word. This leaves INTEGER variables and arrays as the
only possible repository of character data, and even this
may fail. On the IBM 7030 Stretch computer, for example,
integers are represented internally as floating-point
numbers, and unless assembly-language coding is resorted to,
it is very inconvenient just to get character data correctly
in and out of variables on that machine.
The 1977 FORTRAN Standard has made an attempt to
remedy these difficulties by the introduction of a CHARACTER
data type, but is still not going to offer a complete
solution.
First of all, the Hollerith data type is dropped
from the 1977 Standard. This means that a very large body of
existing FORTRAN software which uses character data, even in
an at-present widely portable fashion, may require extensive
changes to run with a FORTRAN 77 compiler, unless
manufacturers can be pressed to continue support of
character data stored in Hollerith constants and variables.
The 1977 standard prohibits all storage equivalencing,
either via COMMON and EQUIVALENCE statements, or by FUNCTION
or SUBROUTINE argument associations, between CHARACTER data
and all other FORTRAN data types. This was necessary to
enable FORTRAN 77 to support variable-length character
strings, so that declarations of the form
SUBROUTINE A (B,C)
CHARACTER B*(*),C(*)*(*)
could be permitted, allowing CHARACTER variables to inherit
both a size and an array length from a calling program. This
forces a compiler to generate code to pass to a called
routine the address of a string descriptor containing size
and dimension information, as well the actual address of the
character data.
Second, standardized library support of character
data in the form of useful utility routines is non-existent
in the 1977 Standard, apart from the ICHAR and CHAR
functions for converting between INTEGER and CHARACTER form.
Third, null character strings, that is, strings of
zero length, are not permitted. Null strings are in fact
quite useful, and indeed, even necessary in some
applications. In particular, a null string cannot be
simulated by any string of non-zero length.
Fourth, the 1977 Standard does not specify the
character set to be used. The fact that many manufacturers
employ their private versions of character sets, each with
its own special character repertoire and collating sequence,
only continues to perpetrate additional machine dependence
upon FORTRAN users.