Last update: Thu Dec 5 17:59:33 2002
Since IEEE 754 arithmetic is often implemented by a combination of hardware and software, operands that are exceptional values (subnormal, Infinity, or NaN), or that result in exceptional values, can be expensive at run time, compared to normal operands, because they must be handled in software.
To investigate this further, the benchmark program timops.c and the shell script timops.sh to run it for each supported precision, together with the associated files Makefile, ieeeftn.h, second.c, and store.c, were used to measure the performance hit from exceptional values in a wide range of architectures.
The benchmark program contains a loop whose trip count for normal operands is adjusted to be at least one second, and then the same trip count is used to run that loop again with up to six types of operands whose
The particular operand values depend on the floating-point precision and on a IEEE 754 floating-point system, but are otherwise independent of the host CPU architecture.
On most current RISC systems, exceptional values are handled by software, but the trap to that software is transparent to the user, apart from taking longer than a hardware implementation would require.
One notable exception to user transparency is the Compaq (formerly DEC) Alpha architecture. Its designers chose to implement a heavily pipelined CPU that (except for the most recent Alpha 21264 and 21364 CPUs) cannot handle exceptional values. The default for C, C++, and Fortran compilers under both Compaq/DEC OSF/1 and GNU/Linux operating systems is to flush underflows abruptly to zero, and to immediately terminate execution on encountering an operand that is subnormal, Infinity, or NaN, or for which the instruction would generate a NaN or Infinity.
In order to produce IEEE 754 nonstop behavior on Compaq/DEC Alpha systems, special compilation options are required:
These options cause the compilers to generate different floating-point instructions that cause traps to software for exceptional operands or results, and in addition, cause the insertion of trap barrier instructions after floating-point operations. The purpose of the latter is to flush the instruction pipeline, allowing precise determination of the interrupt location, so that the software handler can find the instruction and its operands, and complete the job.
Because instruction pipelining is extremely critical for modern high-performance CPUs, it should be expected that the performance hit from IEEE 754 nonstop behavior on Alpha processors may be severe, and that expectation is clearly demonstrated in the tables below.
The complete output data from which the tables below are derived are recorded in timops.raw, which should be consulted for details of operating systems, and absolute times. The timops.awk program filters that file to produce the table entries. Numerical entries in the last 5 columns are the slowdown (when > 1) compared to the loop with normal values.
There are several observations to make about the data in the tables below:
#if defined(__sgi) #includestatic void flush_to_zero(int on_off) /* see "man sigfpe" on SGI IRIX 6.x for documentation */ { union fpc_csr n; n.fc_word = get_fpc_csr(); n.fc_struct.flush = (on_off ? 1 : 0); set_fpc_csr(n.fc_word); } #endif ... #if defined(__sgi) flush_to_zero(0); /* to get support for subnormals! */ #endif
----------------------------------------------------------------------------------------------- CPU MHz Cmpiler fp_size ufl-> ufl-> ofl-> NaN Inf subnorm zero Inf ----------------------------------------------------------------------------------------------- AMD Athlon 1400 gcc 4 4.974 3.376 1.000 1.000 0.991 AMD Athlon 1400 gcc 8 4.802 3.198 1.009 1.000 1.009 AMD Athlon 1400 gcc 12 1.007 1.000 1.000 1.007 1.013 DEC Alpha 21064 EV4 100 gcc 4 1.001 1.007 -n/a- -n/a- -n/a- DEC Alpha 21064 EV4 100 gcc 4 8.252 8.139 8.261 7.532 7.517 DEC Alpha 21064 EV4 100 gcc 8 1.006 0.999 -n/a- -n/a- -n/a- DEC Alpha 21064 EV4 100 gcc 8 8.835 8.697 9.572 7.886 7.815 DEC Alpha 21164 EV5 466 c89 4 1.000 1.000 -n/a- -n/a- -n/a- DEC Alpha 21164 EV5 466 c89 4 43.636 34.727 21.879 21.121 21.439 DEC Alpha 21164 EV5 466 c89 8 1.000 1.000 -n/a- -n/a- -n/a- DEC Alpha 21164 EV5 466 c89 8 66.043 49.217 27.913 27.130 27.333 DEC Alpha 21264 667 c89 4 1.000 0.989 -n/a- -n/a- -n/a- DEC Alpha 21264 667 c89 4 53.359 42.239 0.989 1.000 0.989 DEC Alpha 21264 667 c89 8 1.000 1.010 -n/a- -n/a- -n/a- DEC Alpha 21264 667 c89 8 78.552 57.885 1.000 1.021 1.000 DEC Alpha 21264 667 c89 16 0.986 1.014 -n/a- -n/a- -n/a- DEC Alpha 21264 667 c89 16 1.057 1.000 0.986 0.957 0.986 HP PA-RISC 1.1 7100LC 80 cc 4 12.058 12.178 1.000 92.251 1.000 HP PA-RISC 1.1 7100LC 80 cc 8 16.955 16.841 1.000 11.278 1.000 IBM PowerPC 133 cc 4 0.981 1.000 0.981 0.991 0.981 IBM PowerPC 133 cc 8 1.007 1.007 1.014 1.000 1.007 IBM PowerPC 133 cc 8 1.014 1.014 1.014 1.007 1.000 IBM PowerPC 166 cc 4 0.991 1.000 0.991 0.991 0.991 IBM PowerPC 166 cc 8 1.014 1.020 1.020 1.014 1.000 IBM PowerPC 166 cc 16 1.009 1.009 0.991 0.991 0.991 IBM PowerPC 233 gcc 4 1.006 1.013 1.026 1.000 1.000 IBM PowerPC 233 gcc 8 1.006 1.013 1.019 1.000 1.000 IBM PowerPC 533 cc 4 1.009 1.009 1.018 1.000 1.009 IBM PowerPC 533 cc 8 0.991 0.991 0.991 0.983 0.991 IBM PowerPC 533 cc 8 0.991 1.000 1.000 0.991 1.000 Intel IA-64 (emulated on IA-32) 600 gcc 4 1.012 1.018 1.009 0.941 0.953 Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.009 0.994 0.915 0.921 Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.011 0.998 0.917 0.923 Intel Pentium II 450 cc 4 5.967 3.383 3.367 3.333 3.083 Intel Pentium II 450 cc 8 3.655 2.236 2.227 2.291 2.145 Intel Pentium II 450 cc 12 0.984 1.000 1.000 2.129 2.000 Intel Pentium II (Klamath) 300 cc 4 5.982 3.390 3.373 3.302 3.035 Intel Pentium II (Klamath) 300 cc 8 5.824 3.249 3.213 3.301 3.036 Intel Pentium II (Klamath) 300 cc 12 1.000 0.999 2.468 2.659 2.467 Intel Pentium III 1266 gcc 4 6.014 3.408 3.394 3.317 3.056 Intel Pentium III 1266 gcc 8 5.852 3.268 3.232 3.317 3.056 Intel Pentium III 1266 gcc 12 1.010 1.010 1.000 2.588 2.402 Intel Pentium III (Katmai) 600 gcc 4 6.266 3.538 3.545 3.490 3.224 Intel Pentium III (Katmai) 600 gcc 8 6.176 3.437 3.423 3.514 3.246 Intel Pentium III (Katmai) 600 gcc 12 1.036 1.018 2.518 2.491 2.321 MIPS R10000 180 c89 4 0.991 1.000 1.000 27.596 0.991 MIPS R10000 180 c89 8 0.991 0.991 1.000 27.254 1.000 MIPS R10000 180 c89 16 1.134 1.134 1.127 0.606 0.606 MIPS R10000 195 c89 4 1.010 1.010 1.000 26.346 1.000 MIPS R10000 195 c89 8 1.010 1.010 1.000 26.798 1.000 MIPS R10000 195 c89 16 1.113 1.113 1.120 0.624 0.632 MIPS R4400 150 c89 4 25.635 25.912 1.081 22.858 0.993 MIPS R4400 150 c89 8 26.074 26.007 1.074 22.107 1.013 MIPS R4400 150 c89 16 31.128 10.701 1.137 8.493 0.531 MIPS R4400 175 c89 4 27.354 27.562 1.054 24.492 0.977 MIPS R4400 175 c89 8 27.902 27.826 1.045 23.977 0.962 MIPS R4400 175 c89 16 33.945 11.522 1.132 9.495 0.533 MIPS R5000 180 c89 4 1.062 1.076 1.055 26.090 1.055 MIPS R5000 180 c89 4 1.076 1.076 1.069 31.472 1.083 MIPS R5000 180 c89 8 1.047 1.068 1.054 24.223 1.061 MIPS R5000 180 c89 8 1.054 1.068 1.047 24.439 1.054 MIPS R5000 180 c89 16 1.206 1.198 1.222 0.532 0.540 MIPS R5000 180 c89 16 1.222 1.198 1.230 0.532 0.540 Sun UltraSPARC 400 c89 4 16.675 1.031 0.995 1.015 1.015 Sun UltraSPARC 400 c89 8 14.527 1.015 1.053 1.008 1.015 Sun UltraSPARC 400 c89 16 1.015 1.026 1.031 0.701 0.716 Sun UltraSPARC II 167 c89 4 16.761 1.017 1.009 1.009 1.017 Sun UltraSPARC II 167 c89 8 12.586 1.006 0.994 1.000 1.006 Sun UltraSPARC II 167 c89 16 1.002 1.000 1.002 0.705 0.701 Sun UltraSPARC II 270 c89 4 18.618 0.993 0.993 1.000 0.986 Sun UltraSPARC II 270 c89 8 14.640 0.995 1.000 0.995 1.000 Sun UltraSPARC II 270 c89 16 1.000 1.000 1.000 0.697 0.701 Sun UltraSPARC II 300 c89 4 16.961 1.008 1.008 1.008 1.008 Sun UltraSPARC II 300 c89 8 13.068 1.000 1.000 1.011 1.000 Sun UltraSPARC II 300 c89 16 1.000 1.004 0.989 0.706 0.709 Sun UltraSPARC II 400 c89 4 16.777 1.005 1.000 1.000 1.000 Sun UltraSPARC II 400 c89 8 12.818 1.008 1.000 1.000 1.000 Sun UltraSPARC II 400 c89 16 1.010 1.010 1.010 0.694 0.694 Sun UltraSPARC II 440 c89 4 16.824 0.995 0.995 1.000 1.000 Sun UltraSPARC II 440 c89 8 13.198 1.008 1.016 1.016 1.000 Sun UltraSPARC II 440 c89 16 1.021 1.000 1.021 0.688 0.704 Sun UltraSPARC IIe 500 c89 4 16.981 1.000 1.013 1.000 1.006 Sun UltraSPARC IIe 500 c89 8 13.179 1.000 1.000 1.009 1.000 Sun UltraSPARC IIe 500 c89 16 0.994 1.000 0.994 0.697 0.690 Sun UltraSPARC III 750 c89 4 13.417 0.942 0.897 0.942 0.910 Sun UltraSPARC III 750 c89 8 11.223 1.000 0.995 1.000 1.000 Sun UltraSPARC III 750 c89 16 0.992 1.000 0.992 0.659 0.675 TI SuperSPARC Viking 40 gcc 4 1.000 1.009 0.991 0.991 0.991 TI SuperSPARC Viking 40 gcc 8 0.996 1.000 0.984 0.988 0.984 TI SuperSPARC Viking 40 gcc 8 1.016 1.012 1.000 1.000 1.000 TI SuperSPARC Viking/MXCC 50 gcc 4 1.005 1.000 0.995 0.995 0.989 TI SuperSPARC Viking/MXCC 50 gcc 8 1.005 1.000 0.990 0.990 0.995 TI SuperSPARC Viking/MXCC 50 gcc 8 1.010 1.010 1.000 1.000 0.995 -----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------- CPU MHz Cmpiler fp_size ufl-> ufl-> ofl-> NaN Inf subnorm zero Inf ----------------------------------------------------------------------------------------------- AMD Athlon 1400 gcc 4 4.974 3.376 1.000 1.000 0.991 DEC Alpha 21064 EV4 100 gcc 4 1.001 1.007 -n/a- -n/a- -n/a- DEC Alpha 21064 EV4 100 gcc 4 8.252 8.139 8.261 7.532 7.517 DEC Alpha 21164 EV5 466 c89 4 1.000 1.000 -n/a- -n/a- -n/a- DEC Alpha 21164 EV5 466 c89 4 43.636 34.727 21.879 21.121 21.439 DEC Alpha 21264 667 c89 4 1.000 0.989 -n/a- -n/a- -n/a- DEC Alpha 21264 667 c89 4 53.359 42.239 0.989 1.000 0.989 HP PA-RISC 1.1 7100LC 80 cc 4 12.058 12.178 1.000 92.251 1.000 IBM PowerPC 133 cc 4 0.981 1.000 0.981 0.991 0.981 IBM PowerPC 166 cc 4 0.991 1.000 0.991 0.991 0.991 IBM PowerPC 233 gcc 4 1.006 1.013 1.026 1.000 1.000 IBM PowerPC 533 cc 4 1.009 1.009 1.018 1.000 1.009 Intel IA-64 (emulated on IA-32) 600 gcc 4 1.012 1.018 1.009 0.941 0.953 Intel Pentium II 450 cc 4 5.967 3.383 3.367 3.333 3.083 Intel Pentium II (Klamath) 300 cc 4 5.982 3.390 3.373 3.302 3.035 Intel Pentium III 1266 gcc 4 6.014 3.408 3.394 3.317 3.056 Intel Pentium III (Katmai) 600 gcc 4 6.266 3.538 3.545 3.490 3.224 MIPS R10000 180 c89 4 0.991 1.000 1.000 27.596 0.991 MIPS R10000 195 c89 4 1.010 1.010 1.000 26.346 1.000 MIPS R4400 150 c89 4 25.635 25.912 1.081 22.858 0.993 MIPS R4400 175 c89 4 27.354 27.562 1.054 24.492 0.977 MIPS R5000 180 c89 4 1.062 1.076 1.055 26.090 1.055 MIPS R5000 180 c89 4 1.076 1.076 1.069 31.472 1.083 Sun UltraSPARC 400 c89 4 16.675 1.031 0.995 1.015 1.015 Sun UltraSPARC II 167 c89 4 16.761 1.017 1.009 1.009 1.017 Sun UltraSPARC II 270 c89 4 18.618 0.993 0.993 1.000 0.986 Sun UltraSPARC II 300 c89 4 16.961 1.008 1.008 1.008 1.008 Sun UltraSPARC II 400 c89 4 16.777 1.005 1.000 1.000 1.000 Sun UltraSPARC II 440 c89 4 16.824 0.995 0.995 1.000 1.000 Sun UltraSPARC IIe 500 c89 4 16.981 1.000 1.013 1.000 1.006 Sun UltraSPARC III 750 c89 4 13.417 0.942 0.897 0.942 0.910 TI SuperSPARC Viking 40 gcc 4 1.000 1.009 0.991 0.991 0.991 TI SuperSPARC Viking/MXCC 50 gcc 4 1.005 1.000 0.995 0.995 0.989 AMD Athlon 1400 gcc 8 4.802 3.198 1.009 1.000 1.009 DEC Alpha 21064 EV4 100 gcc 8 1.006 0.999 -n/a- -n/a- -n/a- DEC Alpha 21064 EV4 100 gcc 8 8.835 8.697 9.572 7.886 7.815 DEC Alpha 21164 EV5 466 c89 8 1.000 1.000 -n/a- -n/a- -n/a- DEC Alpha 21164 EV5 466 c89 8 66.043 49.217 27.913 27.130 27.333 DEC Alpha 21264 667 c89 8 1.000 1.010 -n/a- -n/a- -n/a- DEC Alpha 21264 667 c89 8 78.552 57.885 1.000 1.021 1.000 HP PA-RISC 1.1 7100LC 80 cc 8 16.955 16.841 1.000 11.278 1.000 IBM PowerPC 133 cc 8 1.007 1.007 1.014 1.000 1.007 IBM PowerPC 133 cc 8 1.014 1.014 1.014 1.007 1.000 IBM PowerPC 166 cc 8 1.014 1.020 1.020 1.014 1.000 IBM PowerPC 233 gcc 8 1.006 1.013 1.019 1.000 1.000 IBM PowerPC 533 cc 8 0.991 0.991 0.991 0.983 0.991 IBM PowerPC 533 cc 8 0.991 1.000 1.000 0.991 1.000 Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.009 0.994 0.915 0.921 Intel IA-64 (emulated on IA-32) 600 gcc 8 1.015 1.011 0.998 0.917 0.923 Intel Pentium II 450 cc 8 3.655 2.236 2.227 2.291 2.145 Intel Pentium II (Klamath) 300 cc 8 5.824 3.249 3.213 3.301 3.036 Intel Pentium III 1266 gcc 8 5.852 3.268 3.232 3.317 3.056 Intel Pentium III (Katmai) 600 gcc 8 6.176 3.437 3.423 3.514 3.246 MIPS R10000 180 c89 8 0.991 0.991 1.000 27.254 1.000 MIPS R10000 195 c89 8 1.010 1.010 1.000 26.798 1.000 MIPS R4400 150 c89 8 26.074 26.007 1.074 22.107 1.013 MIPS R4400 175 c89 8 27.902 27.826 1.045 23.977 0.962 MIPS R5000 180 c89 8 1.047 1.068 1.054 24.223 1.061 MIPS R5000 180 c89 8 1.054 1.068 1.047 24.439 1.054 Sun UltraSPARC 400 c89 8 14.527 1.015 1.053 1.008 1.015 Sun UltraSPARC II 167 c89 8 12.586 1.006 0.994 1.000 1.006 Sun UltraSPARC II 270 c89 8 14.640 0.995 1.000 0.995 1.000 Sun UltraSPARC II 300 c89 8 13.068 1.000 1.000 1.011 1.000 Sun UltraSPARC II 400 c89 8 12.818 1.008 1.000 1.000 1.000 Sun UltraSPARC II 440 c89 8 13.198 1.008 1.016 1.016 1.000 Sun UltraSPARC IIe 500 c89 8 13.179 1.000 1.000 1.009 1.000 Sun UltraSPARC III 750 c89 8 11.223 1.000 0.995 1.000 1.000 TI SuperSPARC Viking 40 gcc 8 0.996 1.000 0.984 0.988 0.984 TI SuperSPARC Viking 40 gcc 8 1.016 1.012 1.000 1.000 1.000 TI SuperSPARC Viking/MXCC 50 gcc 8 1.005 1.000 0.990 0.990 0.995 TI SuperSPARC Viking/MXCC 50 gcc 8 1.010 1.010 1.000 1.000 0.995 AMD Athlon 1400 gcc 12 1.007 1.000 1.000 1.007 1.013 Intel Pentium II 450 cc 12 0.984 1.000 1.000 2.129 2.000 Intel Pentium II (Klamath) 300 cc 12 1.000 0.999 2.468 2.659 2.467 Intel Pentium III 1266 gcc 12 1.010 1.010 1.000 2.588 2.402 Intel Pentium III (Katmai) 600 gcc 12 1.036 1.018 2.518 2.491 2.321 DEC Alpha 21264 667 c89 16 0.986 1.014 -n/a- -n/a- -n/a- DEC Alpha 21264 667 c89 16 1.057 1.000 0.986 0.957 0.986 IBM PowerPC 166 cc 16 1.009 1.009 0.991 0.991 0.991 MIPS R10000 180 c89 16 1.134 1.134 1.127 0.606 0.606 MIPS R10000 195 c89 16 1.113 1.113 1.120 0.624 0.632 MIPS R4400 150 c89 16 31.128 10.701 1.137 8.493 0.531 MIPS R4400 175 c89 16 33.945 11.522 1.132 9.495 0.533 MIPS R5000 180 c89 16 1.206 1.198 1.222 0.532 0.540 MIPS R5000 180 c89 16 1.222 1.198 1.230 0.532 0.540 Sun UltraSPARC 400 c89 16 1.015 1.026 1.031 0.701 0.716 Sun UltraSPARC II 167 c89 16 1.002 1.000 1.002 0.705 0.701 Sun UltraSPARC II 270 c89 16 1.000 1.000 1.000 0.697 0.701 Sun UltraSPARC II 300 c89 16 1.000 1.004 0.989 0.706 0.709 Sun UltraSPARC II 400 c89 16 1.010 1.010 1.010 0.694 0.694 Sun UltraSPARC II 440 c89 16 1.021 1.000 1.021 0.688 0.704 Sun UltraSPARC IIe 500 c89 16 0.994 1.000 0.994 0.697 0.690 Sun UltraSPARC III 750 c89 16 0.992 1.000 0.992 0.659 0.675 -----------------------------------------------------------------------------------------------