Previous: support-criteria Up: ../karrtn.html Next: argument-summary
THE CHARACTER PRIMITIVES
========================
The character primitives defined in the remainder
of this proposal can all be implemented entirely in FORTRAN
if a standard set of bit primitives is available. However,
because of the differing storage order on some machines such
as the PDP-11 and the DEC VAX 11/780, where characters are
stored in reverse order, a FORTRAN implementation will in
general not be portable, even if such parameters as the
number of bits in a character, and the number of characters
in an INTEGER storage unit, are available
machine-independently through the PORT Library Framework
[FOX78a, FOX78b]. However, an initial FORTRAN
implementation in terms of bit primitives may nevertheless
be useful as a bootstrapping process when software is to be
installed on a new machine. All of the routines will be
straightforward to implement in assembly language, and
particularly for those machines which support character
addressing in hardware, it may be an order of magnitude more
efficient to do so.
Just as in the case of the proposed bit primitives,
it is anticipated that bodies such as the Quantum Chemistry
Program Exchange or the NRCC could act as a source of
implementations of these primitives for a wide variety of
host computers. Installations will also find that
programmers are more easily encouraged to use the standard
character primitives if they are conveniently available,
preferably as part of the local system FORTRAN library.
In the following descriptions, all arguments are
scalar INTEGER variables, except TEXT(*), which represents
either a Hollerith constant, or character data packed with
the maximum number of characters per word. Exceptions to
this will be noted when necessary. Readers familiar with
the programming languages PASCAL and PL/1 will note their
influences on the design of these routines.
The character primitives will be divided into two
classes -- basic routines, and higher-level routines. The
latter can be implemented in FORTRAN in terms of the former,
although on some systems with advanced hardware facilities,
it may be desirable to define them directly in assembly
language.
In developing any software system, a decision must
always be made about how error conditions are to be handled.
In a set of routines which are proposed for adoption as a
Standard, it is clearly unacceptable to ignore errors, and
it is equally unsatisfactory to define behaviour under error
conditions to be "undefined", for this simply means that the
action to be taken is decided by the implementor.
Only two acceptable alternatives exist. Either an
error flag can be returned, or predefined reasonable action
can be taken when errors arise. The first of these places
the burden of error handling on the user of the software,
and frequently results in error conditions simply being
ignored, or perhaps handled incorrectly. The second
alternative simplifies programming on the part of the user
by moving the error processing to a lower level, and also
guarantees consistent error handling in all implementations.
For this reason, the second of these has been adopted for
the character primitives.
An axiom of good programming is that functions
should not have side effects. In practical terms, this
usually means that they should not modify their arguments,
or variables globally accessible through COMMON storage or
its equivalent. This convention has been adhered to in the
definition of the FUNCTION character primitives.
In those primitives which deal with character
strings, rather than single characters, the strings are
defined in terms of three variables. These are the name of
the INTEGER array containing the string, a starting position
(numbering 1,2,3,... from the left), and the number of
characters to be considered, counting from the starting
position. Thus, an argument sequence TEXT,LOC,LEN
represents characters LOC, LOC+1, LOC+2, ..., LOC+LEN-1
stored in the array TEXT(*). It is an error condition if
either LOC or LEN is less than 1, and the action to be taken
will be expressly defined for each primitive. In some cases,
two strings of the same length are present in the argument
list, and the length parameter for the first will then be
omitted. In most applications, the LOC parameter will point
to the first character in the array; its presence is,
however, necessary to allow access to strings which do not
begin at a word boundary.