^{1} Department of Mathematics and Computer Science (IMADA), Faculty of Science, SDU^{2} Computer Science, Department of Mathematics and Computer Science (IMADA), Faculty of Science, SDU^{3} ENS-Lyon^{4} Department of Mathematics and Computer Science (IMADA), Faculty of Science, SDU

Abstract:

This paper presents a study of some basic blocks needed in the design of floating-point summation algorithms. In particular, in radix-2 floating-point arithmetic, we show that among the set of the algorithms with no comparisons performing only floating-point additions/subtractions, the 2Sum algorithm introduced by Knuth is minimal, both in terms of number of operations and depth of the dependency graph. We investigate the possible use of another algorithm, Dekker's Fast2Sum algorithm, in radix-10 arithmetic. We give methods for computing, in radix 10, the floating-point number nearest the average value of two floating-point numbers. We also prove that under reasonable conditions, an algorithm performing only round-to-nearest additions/subtractions cannot compute the round-to-nearest sum of at least three floating-point numbers. Starting from an algorithm due to Boldo and Melquiond, we also present new results about the computation of the correctly-rounded sum of three floating-point numbers. For a few of our algorithms, we assume new operations defined by the recent IEEE 754-2008 Standard are available.

Type:

Journal article

Language:

English

Published in:

I E E E Transactions on Computers, 2012, Vol 61, Issue 3, p. 289-298