Original version:
Tue Oct 15 01:27:09 2002
Last updates:
Fri Dec 06 01:09:08 2002
Wed May 19 15:59:53 2004
Fri Nov 12 15:50:09 2004
Fri May 20 08:32:25 2005
Mon Jul 9 14:26:05 2007
Comments, and reports of errata or bugs, extensions, and ports to new programming languages and machine platforms, are welcome via e-mail to the author,
Nelson H. F. Beebe <beebe@math.utah.edu>.
In your report, please supply the full document URL, and the title and Last update time stamp recorded near the top of the document.
Original version:
Wed Nov 28 18:06:48 2001
Last update:
Fri May 20 08:32:37 2005
This directory contains a small collection of test programs for examining the behavior of IEEE 754 floating-point arithmetic, together with test input and output files. There are also a few additional test programs for other data types.
Unless otherwise noted below, all of these programs were written by Nelson H. F. Beebe, at the Department of Mathematics, University of Utah, Salt Lake City, Utah, USA.
The programs were developed over the course of several years, for teaching floating-point arithmetic, for testing compilers and programming languages, and for surveying prior art, as part of my small contributions to the ongoing work (2000--) on the revision of the IEEE 754 Standard for Binary Floating-Point Arithmetic.
Most of these programs are quite simple, and took only a few minutes to write, usually in either Fortran or C, and were often then manually translated to the other language, and sometimes, to Java and other programming languages. Thanks to manual translation, none of these programs require any additional nonstandard header files or support libraries.
A complete alphabetical contents listing as a UNIX ls -lR recursive long directory listing summarizes what is here, and the companion reverse-time ordered contents listing from a UNIX ls -lRt command readily shows what is new.
See the archive section of the catalog below for information on retrieving the entire collection.
It has been a great tradition in the numerical linear algebra community for more than 30 years of freely sharing software without patents or copyright claims, for free and unrestricted use by anyone.
Sadly, commercialism and lawyers have gotten involved in software, and the view of many people in the open source community today is that even freely-distributable software must be protected by a license that guarantees you the right to use the software, and to further distribute it.
For the software and documentation in this directory that is written by me, Nelson H. F. Beebe <beebe@math.utah.edu> I therefore choose to place this material under the GNU General Public License.
Other code here that is credited to others may contain copyright claims which should be honored in further distributions. In the absence of explicit copyright statements in such code, you may assume that the original authors intended that their work could be freely distributed, in the historical tradition.
Probably over a billion (thousand million) hardware implementations of IEEE 754 arithmetic now exist in desktop and larger computers, cell phones, laser printers, and other embedded devices. However, for some purposes, such as simulating new processors, providing IEEE 754 access on older systems, and allowing bit-for-bit identical results across different systems, it is necessary to have a software implementation.
Because a software arithmetic system is a vital layer underneath other packages, it must be highly portable; otherwise, it would compromise the portability of those packages, and often, simply could not be used if it did pose a porting barrier.
Here is a list of the software arithmetic packages that I know of. Some are historical, some are commercial, and others are provided under generous free-software or open-source licenses.
SANE (Standard Apple Numerics Environment) was used to provide floating-point arithmetic on old Apple Macintosh systems using the Motorola 68K processor.
SANE is no longer available, and was likely specific to the 68K, perhaps even having been written in assembly language, as other key Macintosh O/S components were to maximize performance on systems that were 200 to 500 times slower than current top-end desktops.
As far as I can tell, SANE was never released as a separate product, or in source form, and mention of it has pretty much disappeared from the Apple Web site. The acronym got reused at Apple to mean Scanner Access Now Easy.
Donald Knuth's mmix-arith C-Web program, is available as part of the MMIXware release in the mmix-20050331.tar.gz archive.
MMIX 2009 is a virtual machine whose native assembly code is MMIX, and its (still virtual) operating system is called NNIX. MMIX and its ancestor MIX has been used since 1968 in the famous Art of Computer Programming books to allow detailed machine-level analysis of algorithms. A modified gcc supports compilation of C and possibly C++ into MMIX.
More information on MMIX can be found at these Web sites:
mmix-arith provides only the IEEE 754 32-bit and 64-bit formats. Neither the 80-bit nor the 128-bit extended formats are supported. It would take a fair amount of work to support them in mmix-arith, because the low-level primitives for bit shifting and the basic +, -, /, *, and % operators are specialized for the 32-bit and 64-bit cases.
mmix-arith implicitly assumes 32 bit integers, and that sizeof(int) is 4 bytes. Curiously, it introduces types octa (byte) and tetra (byte) to hide implementation-dependent types when it needs 4-byte and 8-byte values, but it didn't do so for int, even though it won't work correctly if int is 16-bit or 36-bit. While 32-bit integers are widely available today on common computers, the limitation is significant if someone comtemplates making this code work on historical pre-IEEE-754 computer systems, as I do.
mmix-arith requires 32-bit signed and unsigned integer types, but does not require 64-bit integer types.
There is no public test suite for mmix-arith, but its author is famous for the care he takes to ensure correct software. Perhaps someone else will contribute a driver program and test suite.
mmix-arith.w is a literate-program file consisting of 1843 lines of intermixed prose and code. Processing with cweave produces a 2606-line TeX file that when typeset is a 48-page comprehensively-indexed document describing the code. Processing with cweb produces a 1670-line C program that compiles cleanly into an object file with 34 external symbols. The machine-generated C code is run together and not very readable, but application of GNU indent with my favorite options produces a 2336-line file with 1963 nonblank lines of clean C code.
I give these measurements here to document the size of one programming project to implement IEEE 754 arithmetic in software. It is clearly modest enough that advanced programming classes could tackle this job with small student-programmer teams, and a single expert programmer might independently implement it in a week or two. Knuth is among the world's top programmers, and his algorithms are often clever, nonobvious, and fast. Module 28 on p. 10 of the typeset document poses a challenge to readers: are there faster ways to implement a certain three of the key low-level primitives?
The mmix-arith code is old-style (K&R) C, and I have successfully compiled it with c89, cc, and c99 compilers on several different Unix architectures.
I expect that if a Web change file were provided to produce 1989 Standard C function declarations, the mmix-arith code should then also compile with C++ compilers.
As a quick experiment to confirm that conjecture, I ran GNU protoize on the C file. It successfully converted most of the code, but not all of it. In a 10-minute editing job, I added the remaining prototypes, renamed one variable that is a reserved word in C++, added #define for true, false, and bool to avoid confusion with native C++ names, and rewrote a single Boolean assignment to make it legal C++ (and also C). The code has now successfully compiled with several C++ compilers on at least five CPU architectures. All of my changes could be done via a Web change file as well, allowing automated production of C++ code from the C-Web source.
John R. Hauser has written the SoftFloat package and a companion TestFloat package. His Web site reports:
SoftFloat fully implements the four most common floating-point formats: single precision (32 bits), double precision (64 bits), extended double precision (80 bits), and quadruple precision (128 bits). All required rounding modes, exception flags, and special values are supported. ... SoftFloat's code for the extended double-precision and quadruple-precision formats depends on the existence of a 64-bit integer type in C. If the C compiler used to compile SoftFloat does not support 64-bit integers, SoftFloat will be limited to single and double precisions only.
SoftFloat consists of 2807 lines of code in the bits32 tree, and 6100 lines of code in the bits64 tree, exclusive of the timing test code. This is almost four times the size of mmix-arith.c, but this package also provides twice as many formats.
The detailed distribution notes suggest that porting to new platforms might be difficult. At least for most current systems, this is not the case. The Makefiles hardcode the compiler to be GNU gcc, but a two-line edit allows selection of a different compiler. The entire package contains only two source files, one for SoftFloat itself, and one for timing tests. I have successfully built it and run the timing tests on these current and recent major platforms, covering all of the important desktop CPU architectures:
With some native C compilers, I had to change source file line terminators from CR LF to LF, a trivial job with my dos2ux tool. On the two Alpha systems, for the native compilers, I had to add two preprocessor statements to undefine TRUE and FALSE before their use in an enum. No other source code tweaks were necessary anywhere.
I conclude that SoftFloat is highly portable to common current systems, and with a bit more packaging work, could be made easily installable on millions of systems.
Mike Cowlishaw and his group at IBM Hursley Laboratory in the UK have developed an implementation of IEEE 754/854 decimal arithmetic that is being used as a testbed for both the revision of the IEEE 754 standard, as well as for possible future decimal floating-point arithmetic hardware. The Implementations section of their Web site contains pointers to several other packages designed to support decimal floating-point arithmetic in software.
Analog Device's IEEE 754 software library provides 32-bit and 64-bit arithmetic for the Blackfin fixed-point embedded processor.
The GNU compiler collection (gcc) distributions contain the 4011-line file gcc/real.c that implements a software floating-point emulation for processors that lack floating-point hardware. According to the preamble commments, it is not exactly IEEE 754 compliant. It has a 160-bit significand in [0.5,1), and a 27-bit exponent. The wider format is used to emulate the 32-bit, 64-bit, 80-bit, and 128-bit IEEE 754 formats. It can be selected at compile time on some of the platforms that gcc runs on by using the -msoft-float compiler option. Unfortunately, on several systems, I found that linking fails because the needed run-time library support is missing, and on others, nonsensical results are produced, possibly because the I/O library is being passed bit patterns that do not correspond to the expected IEEE 754 encodings.
Others??
Others??
Others??
Computer systems are now so ubiquitous that most users assume that computers are reliable (at least, once their desktop system has been rebooted for the seventh time today). After all, billions of dollars were spent in their development, and hundreds of millions of people use them, so computers must be trustworthy. Right? Wrong!
Schryer's report cited in the bibliography below documents case after case of egregious failures of floating-point arithmetic in computer systems. Similar models of the same architecture were found to behave differently, and sometimes, a program run on the same system behaved differently on successive runs. In several cases, the faults were traced to loose boards or wiring connections, and in others, to errors (`bugs') in microcode or low-level arithmetic circuitry.
Modern systems built from microprocessors have radically reduced the number of connections that are visible, and subject to looseness or dirt. Nevertheless, there are still many sources of failure, and not only from the tens to hundreds of millions of transistors in the microprocessor itself.
Perhaps the most spectacular, and expensive, was the 1995 Intel Pentium floating-point divide flaw. After initial corporate denials, and denigrations of the problem and its human reporters, Intel was ultimately forced to admit its error, and make a massive recall of the flawed chips. The total cost to the corporation was about $480M!
Modern computer arithmetic is based largely on the IEEE 754 Standard for Binary Floating-Point Arithmetic. This Standard defines required and implementation-dependent behavior, but does not specify how the Standard is to be implemented. A computer vendor is free to implement the Standard entirely in hardware, entirely in software, or in a combination of the two.
The AMD AMD-64 (Opteron), Intel IA-32 (formerly, x86), Hewlett-Packard/Intel IA-64, Motorola 68K, and IBM Power and PowerPC systems are examples of hardware-only implementations.
Apple's SANE (Standard Apple Numeric Environment) is an example of a software-only implementation.
Most other vendors of RISC systems, including HP/Compaq/DEC Alpha, Hewlett-Packard PA-RISC, SGI MIPS, and Sun SPARC use a combination of hardware and software. The `easy', and most common, cases are handled in hardware, but traps must be made to software to handle remaining cases. Typically, floating-point exception handling, and generation of special values (subnormal numbers, NaN, and Infinity) are done in software.
There have also been cases where lower-cost entry-level computer systems used more software components in the arithmetic implementation than in higher-cost advanced systems, so even within an architecture family, there can be differences in behavior of the same program from the same executable file.
Computer users everywhere are all-too-familiar with software bugs. Current operating system architectures increase the number of places where such errors can occur. In the early days of computing, and of microprocessors, programs ran almost on bare hardware. Today, a typical program uses several different run-time libraries that are not part of its executable image in the file system. Bugs can arise from code-generation errors in the compiler which compiled the program, or from logic errors in the user program itself, or from the run-time libraries, or from interrupt handlers in the operating system. Either the libraries or the operating system might have changed since the last time you ran your program. In the case of network-aware languages like Java, Limbo, and C#, failures can arise from the network, or on remote systems over which you have no control. Errors can even be circular: an otherwise correctly-working compiler might fail because of an O/S change, producing user code that fails.
The growing recognition of the importance of continual testing has led the Free Software Foundation to encourage the inclusion of extensive validation suites in their software distributions, so that the remote installer can verify during installation that the software is performing as intended. In principle, there is no reason why this testing could not continue after installation as well. Schryer documents several cases where off-peak hours in commercial systems were devoted to rerunning software test suites.
You might ask: why can one not simply do exhaustive testing of the arithmetic system? The answer is: it is generally impossible to do so. In 32-bit precision, there are about four billion different numbers (because 232 = 4,294,967,296). At a computation speed of one result per nanosecond (achievable on advanced single processors in 2001), exhaustive testing of the unary square root function would take only four seconds, and many vendors have done such testing. However, testing of a single binary operation, like add, subtract, multiply, or divide, would take four billion times longer, or more than 136 years. In 64-bit precision, the unary operation test would take 136 years, and the binary operation test, 584 billion years, which is many times the age of the universe. In 128-bit precision, testing would take...well, you get the idea!
In the case of IEEE 754 floating-point arithmetic, vendor implementations may be imperfect, or inexplicable, or incomplete, or inefficient, and programming languages support may be deficient:
The mixed-language programming issue is relevant to some of the software in this collection, and also to many programmers of engineering and scientific software. A separate lengthy document on the subject, entitled Using C and C++ with Fortran, provides much useful information.
Unless otherwise noted below, all of the Fortran programs adhere to a superset of the ANSI Fortran Standard, formally known as American National Standard Programming Language FORTRAN, ANSI X3.9-1978, CSA Standard Z243.18-1980 January, 1980, and ISO/IEC 1539-1980. Although the ANSI Fortran Standard, published on 3-Apr-1978, was late, the language is generally called Fortran 77, and most vendors name their compilers f77.
There are four widely-supported language extensions that are used in some or all of the test programs: (a) lowercase letters in keywords and variables, (b) quadruple precision (REAL*16), (c) Zw.d hexadecimal FORMAT items, and (d) Z'hhhhhhhh' hexadecimal constants in DATA statement initializer lists. In practice, none of these extensions proves a barrier to portability, and all compile without problems on all major UNIX platforms.
All of the Fortran programs should be compilable with compilers for Fortran 77, Fortran 90, Fortran 95, and High-Performance Fortran, subject to these special considerations:
The NAG compilers are unusual, in that they translate Fortran to C, and then invoke the native C compiler to compile to machine code. This means that NAG could potentially offer them on every platform for which a C compiler is available. The NAG Fortran 90 compiler was the first compiler available for that language, as far as I know.
The AT&T (now Lucent) Bell Laboratories Fortran-to-C translator, f2c, with its Fortran compiler wrapper, fc, also potentially provides Fortran support for any platform that has a working C compiler. Unfortunately, support for quadruple precision is not widely available in C compilers, and f2c at present rejects Fortran programs using that extended data type. I have recommended to its developers that this serious limitation be lifted.
On GNU/Linux on Intel IA-32, the only Fortran compilers available that provide support for 128-bit IEEE quadruple precision are Lahey/Fujitsu lf95 and Intel ifort (from version 8 released in late 2003).
Unfortunately, in Lahey/Fujistu Release L5.50a (1999), floating-point exceptions abruptly terminate execution with no possibility of nonstop behavior, making this otherwise excellent Fortran compiler of doubtful utility for serious scientific work. I have reported this problem to the vendor. It appears to be fixed in the L6.20a release (late 2003). The Intel compiler has no such problem.
All programs written in C adhere to 1989 ANSI/ISO Standard C, formally known as ISO/IEC 9899:1999(E) Programming languages -- C, or C89 for short. They can also be compiled and run with any C++ compiler. Sadly, the dozens of C++ implementations available to me are at varying stages of evolution toward the 1998 ISO C++ Standard., and it is not yet practical to write C++ code that works with all them, not even a simple Hello, world example. Thus, I generally write in C, rather than C++, because portability is supremely important to me.
To avoid catastrophic overwriting by Fortran-to-C translators, no C source file has the same base name as a corresponding Fortran source file: a disambiguating numeric suffix is added to the C file base name.
All C programs can also be compiled with old-style Kernighan and Ritchie C. However, inasmuch as I now have access to only one such old compiler, with more than 35 others supporting 1989 Standard C, I no longer attempt to retain K&R compatibility in new code.
As with Fortran compilers on some systems, C compilers on those systems may also require special options to get nonstop behavior. On HP/Compaq/DEC Alpha, both OSF/1 and GNU/Linux, GNU gcc and g++ need the -mieee option. On OSF/1, HP/Compaq/DEC cc, c89, and cxx need the -ieee option.
Since not all C and C++ compilers recognize the long double data type, most programs that use it bracket the code with a preprocessor conditional that tests whether the symbol HAVE_LONG_DOUBLE has been defined. Thus, the C programs here should be compiled with an option to define that symbol if long double is supported. Sensibly-designed C compilers that still lack full support for that data type accept it, but map it to type double with a warning, rather than rejecting the program as erroneous.
Similarly, code that uses the long long data type is bracketed by a conditional that tests whether HAVE_LONG_LONG has been defined.
Compilation of a .java source code file produces a binary .class file, the analog of a .o file from most other UNIX/POSIX/GNU compilers. Unfortunately, some Java implementations lack a program to do the next step: link the .class file into an executable program that can be invoked by name, just like programs written in any other programming language. When such a program is available, it is usually called javald, and the .class file must be preserved. Otherwise, Java programs must be run like this: java foo, where foo.class must already exist.
Java programs have a peculiarity that may be unique among programming languages: the class name in a Java source file must also be used for the file name, with extension .java.
A few programs here are written in the Maple programming language, a proprietary, but widely used, symbolic algebra system developed starting in the early 1980s at the University of Waterloo, Waterloo, Ontario, Canada, and later spun off into a company, Waterloo Maple Software, which continues active development, marketing, and support of Maple.
Translations to other symbolic algebra languages, and to open source code in C or C++ using, for example, the GNU Multiple Precision Arithmetic Library, gmp, will be most welcome.
WARNING: Programs that rely on vendor-provided run-time libraries for decimal<-->binary conversion are always suspect: those conversions may be inaccurate.
Fortran programs assume that unit 5 is open and connected to standard input, and unit 6 is open and connected to standard output. This practice is universal on UNIX systems, and follows a tradition established by IBM in the 1960s, and adopted by all sensible Fortran compiler vendors.
Some compilers need special compilation options to get IEEE 754 nonstop behavior, and most of the test programs here require that behavior. See the notes above for C/C++ and Fortran.
Some architectures have registers that are larger than memory words. These include the now-defunct Honeywell Series 6000, the almost-disappeared Motorola 68K, the widely-used Intel IA-32, and its planned replacement, the Hewlett-Packard/Intel IA-64. In most applications, the extra range and precision of intermediate computations in such registers is beneficial. However, in floating-point test programs, it can be harmful.
For example, a program that determines the machine epsilon by the simple pseudocode
x = 1.0
while ((1.0 + (x / 2.0)) > 1.0)
x = x / 2.0
print x;
will produce a smaller value than expected, because of the extra precision of register computations.
For that reason, the test programs here sometimes have to resort to subterfuges to force intermediate data out of long registers into memory. They do this by calling short functions that simply return their arguments. This achieves their intent with most compilers, but optimizing compilers that silently inline functions may defeat them.
In principle, the C89/C++98/Java type qualifier volatile could be used to avoid the function-call subterfuge, but that keyword isn't recognized by older compilers, so none of the C/C++/Java test programs use it. Fortran offers no analogue of volatile at all.
In large packages, such functions need to be moved to separate files to prevent inlining, but I have not done so for these simple test programs. [In principle, an extremely clever executable image loader could still inline such separately-compiled functions, but I am unaware in 2004 of any system that does so.]
I know of no C, C++, or Fortran compiler on these platforms that does this inlining when compiling at the lowest (usually, debug) optimization level. Thus, for UNIX compilers, use the -g debug option, and avoid -On options. Few of these programs take more than a few seconds to run, and most take much less, so code optimization is definitely not of concern.
If more than a few of these programs are of interest to
you, you can fetch the entire contents of this directory
tree in any of these
archive formats:
COPYING | GNU GENERAL PUBLIC LICENSE, Version 2, June 1991. |
adx2.c | Implementation of Fortran REAL function adx(x,n) in C, to return x * base**n. |
adx.f | Fortran REAL function adx(x,n), to return x * base**n; |
args.h | C header file to support K&R, Standard C, and Standard C++ coding. |
chkinexact.c | C program to test the setting of floating-point exception flags on Sun Solaris. Although all IEEE 754 implementations have these flags, the interface to them is unique to each vendor. Eventually, a common interface needs to be written to allow exception-flag access everywhere. |
configure | Dummy UNIX/POSIX/GNU shell script to allow a building, testing, and installing with the GNU standard procedure: ./configure && make all check install. |
copysign.c | C program to test the IEEE 754 copysign() function. That function is not part of 1989 Standard C, so this program will not work on some systems. |
dadx2.c | Implementation of Fortran DOUBLE PRECISION function dadx(x,n) in C, to return x * base**n. |
dadx.f | Fortran DOUBLE PRECISION function dadx(x,n), to return x * base**n; |
datasize.c | C program to report the implementation sizes of all C primitive data types. |
datasize.sh | UNIX/POSIX/GNU shell script to compile and run datasize.c with a C compiler defined by the CC environment variable. |
datasize-all.sh | UNIX/POSIX/GNU shell script to compile and run datasize.c with all known C compilers. |
datasize-all.txt | Test results from datasize-all.sh. |
deps2.c | Fortran DOUBLE PRECISION function implemented in C to compute the generalized machine epsilon. This implementation works by fast low-level bit manipulation. |
deps.f | Fortran DOUBLE PRECISION function deps(x) to compute the generalized machine epsilon. |
depsiter.c | Fortran DOUBLE PRECISION function implemented in C to compute the generalized machine epsilon. This implementation works by looping. |
dintxp.c | Fortran INTEGER function dintxp(x) implemented in C to return the exponent of the base in the representation of DOUBLE PRECISION x. |
disden.c | Fortran LOGICAL function disden(x) implemented in C to return .TRUE. if DOUBLE PRECISION x is subnormal (formerly, denormalized), and .FALSE. otherwise. |
disinf.c | Fortran LOGICAL function disinf(x) implemented in C to return .TRUE. if DOUBLE PRECISION x is infinite, and .FALSE. otherwise. |
disnan.c | Fortran LOGICAL function disnan(x) implemented in C to return .TRUE. if DOUBLE PRECISION x is a NaN, and .FALSE. otherwise. |
dmachar.f | Fortran subroutine to determine the characteristics of the host DOUBLE PRECISION floating-point system. [This program was written by W. J. Cody, Jr., as part of the ELEFUNT test package.] |
dnxtaft.f | Fortran DOUBLE PRECISION function dnxtaft(x,y) to return the next floating-point number after x in the direction of y. This version uses slow iteration instead of fast bit manipulation. |
dran.f | Fortran DOUBLE PRECISION function to return a pseudo-random number uniformly distributed on (0,1). [This program was written by W. J. Cody, Jr., as part of the ELEFUNT test package.] |
drandl.f | Fortran DOUBLE PRECISION function to return a pseudo-random number logarithmically distributed on (1,dexp(x)). [This program was written by W. J. Cody, Jr., as part of the ELEFUNT test package.] |
dsetxp.c | Fortran DOUBLE PRECISION function dsetxp(x,n) implemented in C to return (fraction of x) * base**n, that is, to replace the exponent field of a copy of x. |
dsqrt.f | Fortran DOUBLE PRECISION square root function. This was written for an introductory class on scientific programming to show how one Fortran elementary function can be implemented; see below for documentation. |
dssqrt.f | Fortran REAL square root function, with intermediate computation in DOUBLE PRECISION. This was written for an introductory class on scientific programming to show how one Fortran elementary function can be implemented; see below for documentation. |
dtanh.f | Fortran DOUBLE PRECISION hyperbolic tangent function. This was written for an introductory class on scientific programming to show how one Fortran elementary function can be implemented; see below for documentation. |
elefunt.tar.gz | ELEFUNT elementary function test package. The Fortran code was written by W. J. Cody, Jr. and W. Waite, and described in their excellent book Software Manual for the Elementary Functions, Prentice-Hall (1980), ISBN 0-13-822064-6, LCCN QA331 .C635 1980. The C translation was done by hand by Kenneth Stoner, then a high-school student, with assistance from Nelson Beebe, before Fortran-to-C translators were available. The updated release of this package (June 2002) includes new quadruple-precision ELEFUNT packages for Fortran and C/C++, and new single- and double-precision ELEFUNT packages for Java. Also available in these alternative archive formats: elefunt.jar, elefunt.zip, and elefunt.zoo. |
eps2.c | Fortran REAL function implemented in C to compute the generalized machine epsilon. This implementation works by fast low-level bit manipulation. |
eps.f | Fortran REAL function eps(x) to compute the generalized machine epsilon. |
epsiter.c | Fortran REAL function implemented in C to compute the generalized machine epsilon. This implementation works by looping. |
fpinfo.f | Fortran program to report the size of single-, double-, and quadruple-precision datatypes, the floating-point precision in bits, and the corresponding machine epsilon and smallest floating-point numbers for each. |
fpinfosd.f | Fortran program to report the size of single- and double-precision datatypes, the floating-point precision in bits, and the corresponding machine epsilon and smallest floating-point numbers for each. |
fpinfo2.c | C program to report the size of single-, double-, and quadruple-precision datatypes, the floating-point precision in bits, and the corresponding machine epsilon and smallest floating-point numbers for each. |
fpshow.c | C program to print a number of important constants in decimal and hexadecimal to check internal representations on a new machine. |
hoc | This is a significant extension of the Kernighan & Pike high-order calculator, hoc, with rich support for floating-point arithmetic. |
ieeeftn.h | C header file to support interfacing floating-point utility access from Fortran to C. |
infnani.f | Fortran program to compute and display hexadecimal values of signed NaN and Infinity, and to display hexadecimal and decimal results of their conversion to integer values. |
intxp.c | Fortran INTEGER function intxp(x) implemented in C to return the exponent of the base in the representation of REAL x. |
isden.c | Fortran LOGICAL function isden(x) implemented in C to return .TRUE. if REAL x is subnormal (formerly, denormalized), and .FALSE. otherwise. |
isinf.c | Fortran LOGICAL function isinf(x) implemented in C to return .TRUE. if REAL x is infinite, and .FALSE. otherwise. |
isnan.c | Fortran LOGICAL function isnan(x) implemented in C to return .TRUE. if REAL x is a NaN, and .FALSE. otherwise. |
lstohtml.awk | awk program to convert a UNIX directory listing to HTML (used by the Makefile below). |
machar.f | Fortran subroutine to determine the characteristics of the host REAL floating-point system. [This program was written by W. J. Cody, Jr., as part of the ELEFUNT test package.] |
Makefile | UNIX/POSIX/GNU Makefile to manage building, testing, and installing with the GNU standard procedure: ./configure && make all check install. All GNU standard make targets are supported, so to reduce the directory to its original distribution status, do make distclean. The install target is a dummy; there is nothing to install on the local system. |
minmax.f | Fortran program to test the handling of IEEE 754 NaN by the min() and max() intrinsic functions. |
minmax.sh | UNIX/POSIX/GNU shell script to compile and run minmax.f with all known Fortran compilers on the current system. |
minmax.txt | Email message to one of the IEEE 754 committee members with test results for the minmax.f program with 61 compilers on 15 different UNIX platforms. The results are consistently dismal: not one of them handles NaN arguments in the expected way. |
mmsqrt.f | Fortran REAL square root function implemented using the Moler/Morrison cubically-convergent iterative algorithm. This is not a good algorithm for small x; see the code's header comments. |
mpieee.maple | Program to convert floating-point numbers in decimal to IEEE-754-like binary formats, and display them in hexadecimal and decimal. |
mpieee-test.maple | Program to use mpieee.maple to compute particular limits in IEEE 754 floating-point arithmetic. |
mpieee-test.out | Test output from mpieee-test.maple. |
nxtaft.f | Fortran REAL function nxtaft(x,y) to return the next floating-point number after x in the direction of y. This version uses slow iteration instead of fast bit manipulation. |
ofl.c | C program to test which floating-point exceptions are raised during floating-point multiply and multiply-add for 0 and Infinity operands. |
ofl.sh | UNIX/POSIX/GNU shell script to compile and run ofl.c with a C compiler defined by the CC environment variable. |
ofl-all.sh | UNIX/POSIX/GNU shell script to compile and run ofl.c with all known C compilers. |
ofl-all.txt | Test results from ofl-all.sh. |
paranoia.c | C program to carry out a torture test on the host system's floating-point arithmetic. [This program is credited to W. M. Kahan, B. A. Wichmann, David M. Gay, Thos Sumner, and Richard Karpinski. It has been extended by NHFB to permit compilation under Standard C and Standard C++.] Versions of this program in other languages are available in the Netlib archive. [That archive contains a few bug fixes to this program which have not yet been incorporated in the version here.] |
paranoia.h | Header file for paranoia.c. |
qtest.f | Fortran floating-point accuracy test program. [This program was written by W. M. Kahan.] |
ran.f | Fortran REAL function to return a pseudo-random number uniformly distributed on (0,1). [This program was written by W. J. Cody, Jr., as part of the ELEFUNT test package.] |
randl.f | Fortran REAL function to return a pseudo-random number logarithmically distributed on (1,exp(x)). [This program was written by W. J. Cody, Jr., as part of the ELEFUNT test package.] |
README.NaN | Brief summary of findings about the representation of NaN on various platforms. |
round-trip-problems.c | C program to find single-precision test numbers that are hard to convert from binary to decimal and back accurately. [This program was written by David Goldberg <goldberg@parc.xerox.com>.] |
rwfp.f | Fortran program to read floating-point numbers, and display them as single-, double-, and quadruple-precision values in hexadecimal and decimal. |
rwfp2a.c | C program to read floating-point numbers, and display them as single-, double-, and quadruple-precision values in hexadecimal and decimal, assuming printf() supports a %Le format item. |
rwfp2b.c | C program to read floating-point numbers, and display them as single-, double-, and quadruple-precision values in hexadecimal and decimal, assuming printf() supports a %lle format item. |
rwinfnan1.f | Fortran program to test whether Infinity and NaN can be output to a text file, and then input correctly. |
rwinfnan1.txt | Test results for rwinfnan1.f. |
rwinfnan2.c | C program to test whether Infinity and NaN can be output to a text file, and then input correctly. |
rwinfnan2.txt | Test results for rwinfnan2.c. |
rwinfnan3.cc | C++ program to test whether Infinity and NaN can be output to a text file, and then input correctly. Because of variations in names and locations of C++ header files at different language development stages, it is unlikely that this program can be successfully compiled by any C++ compiler, other than GNU g++. Work remains to be done on developing suitable wrappers to hide this header file mess, and make it easier to write portable C++ programs. |
rwinfnan3.txt | Test results for rwinfnan3.cc. |
rwinfnanascii.java | Java program to investigate whether Infinity and NaN can be generated and successfully written to, and read from, a text file. |
rwinfnanbinary.java | Java program to investigate whether Infinity and NaN can be generated and successfully written to, and read from, a binary file. |
second.c | Fortran DOUBLE PRECISION function implemented in C to return the CPU time in seconds since job start. |
setxp.c | Fortran REAL function setxp(x,n) implemented in C to return (fraction of x) * base**n, that is, to replace the exponent field of a copy of x. |
signchar.c | C program to report on characteristics of the char data type. |
signchar.sh | UNIX/POSIX/GNU shell script to compile and run signchar.c with a C compiler defined by the CC environment variable. |
signchar-all.sh | UNIX/POSIX/GNU shell script to compile and run signchar.c with all known C compilers. |
signchar-all.txt | Test results from signchar-all.sh. |
signchar-lcc.sh | UNIX/POSIX/GNU shell script to compile and run signchar.c with the lcc C compiler. |
sqrt.f | Fortran REAL square root function. This was written for an introductory class on scientific programming to show how one Fortran elementary function can be implemented; see below for documentation. |
sqrt.pdf | Documentation for dsqrt.f, dssqrt.f, and sqrt.f (see above), in PDF format. |
sqrt.ps.gz | Documentation for dsqrt.f, dssqrt.f, and sqrt.f (see above), in PostScript format. |
subnorm.f | Fortran program to test whether the smallest representable single-, double-, and quadruple-precision floating-point numbers are normalized, or subnormal (formerly called denormalized). |
subnorm2.f | Fortran program to test whether the smallest representable single- and double-precision floating-point numbers are normalized, or subnormal (formerly called denormalized). |
tanh.pdf | Documentation for dtanh.f (see above), in PDF format. |
tanh.ps.gz | Documentation for dtanh.f (see above), in PostScript format. |
tdsqrt1.f | Fortran program to test the accuracy of the native dsqrt(x) function. [This program was written by W. J. Cody, Jr. and W. Waite as part of the ELEFUNT test package.] |
tdsqrt2.f | Fortran program to test the accuracy of a private dsqrt(x) function. [This program was written by W. J. Cody, Jr. and W. Waite as part of the ELEFUNT test package.] |
test-gradual-underflow.c | C program to test the speed of handling gradual underflow. [This program was written by Vaughan Pratt <pratt@cs.stanford.edu>.] |
testdp.c | C program to test the Fortran DOUBLE PRECISION support functions. This program needs several of the other files listed in this table, and takes a long time to run (from several minutes to a few hours on early 1990s-vintage workstations). |
testsp.c | C program to test the Fortran REAL support functions. This program needs several of the other files listed in this table, and takes a long time to run (from several minutes to a few hours on early 1990s-vintage workstations). |
timdep.c | C program to run timing tests for the Fortran deps(x) generalized machine epsilon function. This program needs several of the other files listed in this table. |
timops | Benchmark report on the cost of IEEE 754 exceptional values for a wide range of architectures. All of the code, and the raw and summary results, are available from that document. |
timeps.c | C program to run timing tests for the Fortran eps(x) generalized machine epsilon function. This program needs several of the other files listed in this table. |
timtst.c | C program to run timing tests of the Fortran floating-point primitives. |
tsqrt1.f | Fortran program to test the accuracy of the native sqrt(x) function. [This program was written by W. J. Cody, Jr. and W. Waite as part of the ELEFUNT test package.] |
tsqrt2.f | Fortran program to test the accuracy of a private sqrt(x) function. [This program was written by W. J. Cody, Jr. and W. Waite as part of the ELEFUNT test package.] |
tstchr.c | C program to print the return values of the Standard C isxxx() character classifier functions for every character in the 256-character set. |
tstdep.c | Another C program to test the Fortran DOUBLE PRECISION support functions. This program needs several of the other files listed in this table, and takes a long time to run (from several minutes to a few hours on early 1990s-vintage workstations). |
tsteps.c | Another C program to test the Fortran REAL support functions. This program needs several of the other files listed in this table, and takes a long time to run (from several minutes to a few hours on early 1990s-vintage workstations). |
wchar.awk | awk program to summarize the output of wchar.sh. |
wchar.c | C program to report characteristics of wide character (wchar_t) support. |
wchar.sh | UNIX/POSIX/GNU shell script to compile and run wchar.c with a C compiler defined by the CC environment variable. |
wchar-all.sh | UNIX/POSIX/GNU shell script to compile and run wchar.c with all known C compilers. |
wchar-all.txt | Test results from wchar-all.sh. |
wchar-lcc.sh | UNIX/POSIX/GNU shell script to compile and run wchar.c with the lcc C compiler. |
zerocomp.c | C program to test whether oppositedly-signed zeros are correctly compared. |
zerosdqb.f | Fortran program to test whether IEEE 754 signed zero is supported in single, double and quadruple precision. This version needs a nonstandard BYTE data type. |
zerosdqi.f | Fortran program to test whether IEEE 754 signed zero is supported in single, double and quadruple precision. This version needs a nonstandard INTEGER*1 data type. |
zerosdb.f | Fortran program to test whether IEEE 754 signed zero is supported in single and double precision. This version needs a nonstandard BYTE data type. |
zerosdi.f | Fortran program to test whether IEEE 754 signed zero is supported in single and double precision. This version needs a nonstandard INTEGER*1 data type. |
zerosdq.c | C program to test whether IEEE 754 signed zero is supported in single, double and quadruple precision. |
This collection certainly needs to be augmented with additional software to test other features of the implementation of floating-point tests.
While many of the programs here are simply test programs, there are several that are intended to provide platform-independent access to features of the underlying floating-point system. Most of these have names that were recommended either by the 1985 IEEE 754 Standard, or by Cody and Waite in their book.
I would ultimately like to extend this collection to include well-documented, well-tested, and highly-portable implementations of all of the floating-point primitives recommended by the IEEE 754 standard, both the original one, and its eventual revision, together with equally portable test packages to verify their correct functioning on any platform.
These primitives ultimately need to be available for all major programming languages, which certainly includes Fortran, C, C++, Java, and Common Lisp, but also symbolic algebra languages like Axiom, Maple, Mathematica, and Reduce, linear algebra systems like Matlab and Octave, statistical languages like S-Plus and R, and scripting languages like awk, icon, guile, perl, python, rexx, ruby, tcl, ... The key to this is certainly to first get them available for C, because most of the other current programming languages can interface to C, or can be extended to do so.
The University of California at Berkeley floating-point test suite, ucbtest (later extended at Sun Microsystems) is available in the Netlib archives at http://www.netlib.org/fp/ucbtest.tgz
The IEEE 754 Compliance Checker, IeeeCC754, is available at http://win-www.uia.ac.be/u/cant/ieeecc754.html From its Web site: ``... is a precision and range independent tool to test whether an implementation of floating-point arithmetic (in hardware or software) is compliant with the principles of the IEEE floating-point standard.''
The C/C++ compiler feature test package can be used to report what features of the 1989 and 1999 ISO C Standard, and the 1998 ISO C++ Standard, are supported by your compiler. From version 1.1, it includes some tests of basic IEEE 754 arithmetic, but does not exercise any of the floating-point elementary functions in the runtime library.
The Cody/Waite ELEFUNT elementary function test package in Fortran, C/C++, and Java.
I maintain an extensive bibliography of publications about floating-point arithmetic, carrying on work started by Norbert Juffa. The material is also available at ftp://ftp.math.utah.edu/pub/tex/bib/fparith.*.
Notable recent books listed there with an emphasis on computer arithmetic and computation of elementary functions include those with citation labels Higham:1996:ASN (Accuracy and Stability of Numerical Algorithms), Markstein:2000:IEF (IA-64 and elementary functions: speed and precision), Moshier:1989:MPM (Methods and Programs for Mathematical Functions), Mueller:2000:CAC (Computer Architecture: Complexity and Correctness), Omondi:1994:CAS (Computer Arithmetic Systems: Algorithms, Architecture, and Implementation), Overton:2001:NCI (Numerical Computing with IEEE Floating Point Arithmetic, Including One Theorem, One Rule of Thumb, and One Hundred and One Exercises), and Parhami:2000:CAA (Computer Arithmetic: Algorithms and Hardware Designs). These are all well-written books that are worth reading. Markstein's book is is the first to address the question of accurate floating-point computation on the new Hewlett-Packard/Intel IA-64 architecture, which is covered in a separate bibliography, also available at ftp://ftp.math.utah.edu/pub/tex/bib/intel-ia-64.*.
Norman Schryer's report [Schryer:1981:TCF] describes a very large (ca. 30K lines of portable Fortran 77 code) test package, FPTEST, for floating-point arithmetic that was developed prior to IEEE 754 arithmetic, but has been used subsequently to test systems with that arithmetic. The package is not freely distributable, but is available under license from Bell Laboratories. Unfortunately, the FPTEST distribution is missing a major part of the test suite; I'm investigating this further.
Schryer's report makes very interesting reading, and details quite a number of unexpected and bizarre failures of arithmetic, including several where floating-point results were time dependent on the same machine!
The father of IEEE 754 arithmetic, W. Kahan, with co-author J. D. Darcy, is severely critical of Java's floating-point arithmetic in Kahan:1998:HJFa and Kahan:1998:HJFb, in the article How Java's Floating-Point Hurts Everyone Everywhere.
Charles Severance and W. Kahan have written an interesting document on the history of the development of IEEE 754 arithmetic, available at http://www.cs.berkeley.edu/~wkahan/ieee754status/754story.html.