diff options
Diffstat (limited to 'doc/realinferno/real.ms')
| -rw-r--r-- | doc/realinferno/real.ms | 611 |
1 files changed, 611 insertions, 0 deletions
diff --git a/doc/realinferno/real.ms b/doc/realinferno/real.ms new file mode 100644 index 00000000..e2174dba --- /dev/null +++ b/doc/realinferno/real.ms @@ -0,0 +1,611 @@ +.FP palatino +.de ]C +\&[\\$1]\\$2 +.. +.if \nZ=0 .so real.ref +.EQ +delim $$ +.EN +.TL +Real Inferno +.AU +.I "Eric Grosse" +.AI +.I "Lucent Technologies, Bell Labs" +.I "Murray Hill NJ 07974 USA" +.I "ehg@bell-labs.com" +.\"date{19 Aug 1996, minor revisions 7 Jan 1998} +.FS +Previously appeared in R.F. Boisvert (editor), +.ft I +The Quality of Numerical Software: Assessment and +Enhancement: Proceedings of the IFIP TC2/WG2.5 +Working Conference on the Quality of Numerical +Software, Oxford, +United Kingdom, 8-12 July 1996, +.ft R +Chapman Hall, +London, +1997 (pp. 270-279). +.FE +.AB +Inferno is an operating system well suited to applications that need to be +portable, graphical, and networked. This paper describes the fundamental +floating point facilities of the system, including: tight rules on +expression evaluation, binary/decimal conversion, exceptions and rounding, +and the elementary function library. +.AE +.PP +Although the focus of Inferno is interactive media, its portability across +hardware and operating platforms, its relative simplicity, and its strength +in distributed computing make it attractive for advanced scientific +computing as well. Since the appearance of a new operating system is a +relatively uncommon event, this is a special opportunity for numerical +analysts to voice their opinion about what fundamental facilities they need. +The purpose of this short paper is to describe numerical aspects of the +initial release of Inferno, and to invite comment before the tyranny of +backward compatibility makes changes impossible. +.PP +An overview of Inferno is given by Dorward et. al. +.]C "Inferno" , +but for our immediate purposes it may suffice to say that Inferno plays the +role of a traditional operating system (with compilers, process control, +networking, graphics, and so on) but can run either on bare hardware or on +top of another operating system like Windows95 or Unix. Programs for +.I "Inferno" +are written in the language +.I "Limbo" +and compiled to +machine-independent object files for the +.I "Dis" +virtual +machine, which is then implemented with runtime compilation for best +performance. Files are accessible over networks using the +.I "Styx" +protocol; together with the presentation of most system resources as files +and the manipulation of file namespaces, this permits integration of a +collection of machines into a team. Limbo looks somewhat like a mixture of C +and Pascal, augmented by modules (to cope with the namespace and dynamic +loading needs of large programs) and by a channel facility for convenient +(coarse-grain) parallel programing. Array references are bounds-checked and +memory is garbage collected. +.PP +The rest of this paper covers the fundamental floating point environment +provided by the Limbo compiler and +.I "math" +module, the ``elementary +functions,'' and finally some comments on why particular definitions were +chosen or why certain facilities were included or excluded. This discussion +assumes the reader is familiar with scientific computing in general and the +IEEE floating point standard in particular. +.NH 1 +Floating point +.PP +In Limbo, arithmetic on literal and named constants is evaluated at compile +time with all exceptions ignored. Arithmetic on variables is left by the +compiler to runtime, even if data path analysis shows the value to be a +compile time constant. This implies that tools generating Limbo source must +do their own simplification, and not expect the compiler to change $x/x$ +into $1$, or $-(y-x)$ into $x-y$, or even $x-0$ into $x$. Negation $-x$ +changes the sign of $x$; note that this not the same as $0-x$ if $x=0$. +.PP +The compiler may perform subexpression elimination and other forms of code +motion, but not across calls to the mode and status functions. It respects +parentheses. The evaluation order of $a+b+c$ follows the parse tree and is +therefore the same as for $(a+b)+c$. These are the same rules as for Fortran +and C. +.PP +Contracted multiply-add instructions (with a single rounding) are not +generated by the compiler, though they may be used in the native +.SM BLAS +libraries. All arithmetic follows the IEEE floating point standard +.]C "IEEEfp" , +except that denormalized numbers may not be supported; see the +discussion in section 3. +.PP +The most important numerical development at the language level recently has +been accurate binary/decimal conversion +.]C "Clinger" +.]C "Gay" +.]C "SteeleWhite" . +Thus printing a real using +.CW "\%g" +and reading +it on a different machine guarantees recovering identical bits. (Limbo uses +the familiar +.I "printf" +syntax of C, but checks argument types against +the format string at compile time, in keeping with its attempt to help the +programmer by stringent type checking.) A good +.I "strtod/dtoa" +is, +unfortunately, 1700 lines of source (15kbytes compiled), though with modest +average runtime penalty. This code must be used in the compiler so that +coefficients are accurately transferred to bytecodes. Smaller, faster, but +sloppier, runtimes will also be provided when mandated by limited memory and +specialized use. However, programmers may assume the features described in +this paper are present in all Inferno systems intended for general computing. +.PP +Each thread has a floating point control word (governing rounding mode and +whether a particular floating point exception causes a trap) and a floating +point status word (storing accumulated exception flags). Functions +.I "FPcontrol" +and +.I "FPstatus" +copy bits to the control or status word, in +positions specified by a mask, returning previous values of the bits. +.I "getFPcontrol" +and +.I "getFPstatus" +return the words unchanged. +.PP +The constants +.I "INVAL, ZDIV, OVFL, UNFL, INEX" +are non-overlapping +single-bit masks used to compose arguments or return values. They stand for +the five IEEE exceptions: +.IP • +``invalid operation'' ($0/0$,$0 * infinity $,$ infinity - infinity $,$sqrt{-1}$) +.IP • +``division by zero'' ($1/0$), +.IP • +``overflow'' ($1.8e308$) +.IP • +``underflow'' ($1.1e-308$) +.IP • +``inexact'' ($.3*.3$). +.PP +The constants +.I "RND_NR, RND_NINF, RND_PINF, RND_Z" +are distinct +bit patterns for ``round to nearest even'', ``round toward $-{infinity} $'', +``round toward $+{infinity} $'', ``round toward $0$'', any of which can be set +or extracted from the floating point control word using +.I "RND_MASK" . +For example, +.IP • +to arrange for the program to tolerate underflow, +.I "FPcontrol(0,UNFL)." +.IP • +to check and clear the inexact flag, +.I "FPstatus(0,INEX)." +.IP • +to set directed rounding, +.I "FPcontrol(RND_PINF,RND_MASK)." +.PP +By default, +.I "INEX" +is quiet and +.I "OVFL, UNFL, ZDIV," +and +.I "INVAL" +are fatal. By default, rounding is to nearest even, and library +functions are entitled to assume this. Functions that wish to use quiet +overflow, underflow, or zero-divide should either set and restore the +control register themselves or clearly document that the caller must do so. +The ``default'' mentioned here is what a Limbo program gets if started in a +fresh environment. Threads inherit floating point control and status from +their parent at the time of spawning and therefore one can spawn a ``round +toward 0'' shell and re-run a program to effortlessly look for rounding +instabilities in a program. +.NH 1 +Elementary functions +.PP +The constants +.I "Infinity, NaN, MachEps, Pi, Degree" +are defined. Since +Inferno has thorough support of Unicode, it was tempting to name these $infinity $, $ε $, $π $, and °, but people (or rather, their +favorite text editing tools) may not be ready yet for non-\s-2ASCII\s0 +source text. +.I "Infinity" +and +.I "NaN" +are the positive infinity +and quiet not-a-number of the IEEE standard, double precision. +.I MachEps +is $2 sup {-52}$, the unit in the last place of the mantissa $1.0$. +The value of +.I "Pi" +is the nearest machine number to the +mathematical value $π $. +.I "Degree" +is +$"Pi" / 180$. +.PP +Three useful functions +.I "fdim, fmax, fmin" +are adopted from the +Numerical C extensions +.]C "NumerC" . +The unusual one of these, often +denoted $(x-y) sub {+}$, is defined by $roman "fdim" ( x , y )=x-y$ if $x > y$, else $0$. The compiler may turn these into efficient machine instruction sequences, +possibly even branch-free, rather than function calls. There are two almost +redundant mod functions: +.I "remainder(x,y)" +is as defined by the IEEE +standard (with result $roman "|" r roman "|" <= y/2$); +.I "fmod(x,y)" +is $x roman "mod" y$, +computed in exact arithmetic with $0<= r<y$. Limbo has a ``tuple'' type, +which is the natural return value in the call $(i,f)= roman "modf" ( x )$ to +break $x$ into integer and fractional parts. The function +.I "rint" +rounds to an integer, following the rounding mode specified in the floating +point control word. +.PP +For a good-quality, freely-available elementary function library, +.I "math" +uses the IEEE subset of +.I "fdlibm" +.]C "fdlibm" . +Of course, a +conforming implementation may use entirely different source, but must take +care with accuracy and with special arguments. There are the customary +power, trigonometric, Bessel, and erf functions, and specialized versions $roman "expm1"( x )=e sup x - 1$, $roman "log1p" ( x )=log ( 1 + x )$. An additional function +$roman "pow10"( n ) = 10 sup n$ is defined; in the default implementation this is +just fdlibm's $roman "pow" ( 10. , n )$ but it is provided so that separate +trade-offs of accuracy and simplicity can be made +.]C "MacIlroy" . +.I "fdlibm" +uses extra precise argument reduction, so the computed $sin (n*Pi)$ +is small but nonzero. If demands warrant, degree versions of the +trigonometric functions will be added, but for now the style $sin (45*Degree)$ is used. +The library also provides IEEE functions +.I "ilogb, scalbn, copysign, finite, isnan," +and +.I "nextafter" . +.PP +The functions +.I "dot, norm1, norm2, iamax, gemm" +are adopted from the +.SM BLAS +.]C "blas" +to get tuned linear algebra kernels for +each architecture, possibly using extra-precise accumulators. These are +defined by $sum {{x sub i}{y sub i}}$, $sum roman | {x sub i} roman | $, $ sqrt{sum { x sub {i sup 2}}} $, $i$ such +that $roman | {x sub i} roman | = roman max $, and $C= alpha AB + beta C$ with optional transposes on $A$ +and $B$. Since Limbo has only one floating-point type, there is no need here +for a precision prefix. Limbo array slices permit the calling sequences to +be more readable than in Fortran77 or C, though restricted to unit stride. +This encourages better cache performance anyway. The matrix multiply +function +.I "gemm" +remains general stride (and is the foundation for +other operations +.]C "Kagstrom" ). +.PP +Limbo is like C in providing singly-subscripted arrays with indexing +starting at 0. Although Limbo offers arrays of arrays, as in C, for +scientific work a better choice is to adopt the style of linearizing +subscripts using Fortran storage order. This promotes easier exchange of +data with other applications and reuses effort in organizing loops to +achieve good locality. In previous language work +.]C "pine" , +we implemented +a C preprocessor that allowed the programmer to choose a convenient origin +(such as 1) and have it compiled into 0 for the base language; because we +passed arrays as dope vectors, we were even able to allow different origins +for the same array in calling and called functions. The main lesson we +learned from that experience, however, was that permutations become a +nightmare when there is anything but dogmatic adherence to a single origin. +So for an $m$ by $n$ matrix $A$, the programmer should use loops with $0<= +i<m$ and $0<= j<n$ and access $A[i+m*j]$. +.PP +For interoperability with foreign file formats and for saving main memory in +selected applications, functions are provided for copying bits between and +reals and 32-bit or 64-bit IEEE-format values. +.PP +Finally, +.I "math" +provides a tuned quicksort function +.I "sort(x,p)" +where +.I "x" +is a real array and +.I "p" +is an int array representing +a 0-origin permutation. This function leaves the contents of +.I "x" +untouched and rearranges +.I "p" +so that $x[{p sub i}]<= x[p sub {i+1}]$. This is +usually what one wants to do: sort an array of abstract data types based on +some key, but without the need to actually swap large chunks of memory. +.NH 1 +Rationale +.PP +This section discusses why certain numerical features were included or not. +.NH 2 +Rounding modes and accumulated exceptions +.PP +Directed rounding is only needed in a very few places in scientific +computing, but in those places it is indispensable. Accumulated floating +point exceptions are even more useful. User trap handling is a harder +problem, and may be worth leaving for later, possibly with a default +``retrospective diagnostics'' log +.]C "Kahan" . +.PP +Note that the exception masks must be architecture independent, since they +reside in the Limbo bytecodes, and therefore the implementation involves a +small amount of bit fiddling. Still, it is efficient enough to encourage +use. It would be difficult to port to a processor that only had static +rounding modes in instruction opcodes rather than the dynamic model +specified in section 2 of the IEEE standard. Fortunately, the Alpha +does provide both models. +.NH 2 +Sudden underflow +.PP +Some processor vendors make supporting gradual underflow just too hard. (One +must struggle upon the system trap to reconstruct exactly which instruction +was executing and what the state of the registers was. On the MIPS, it is +said to be 30 pages of assembler.) So Inferno supports denormalized numbers +only if the hardware makes this easy. Providing underflow that is correct +but very slow, as some systems do, is not necessarily doing the user a favor. +.PP +To determine portably if a particular system offers gradual underflow, mask +off UNFL and do trial arithmetic. +.NH 2 +Speed +.PP +Computers with slow (software) gradual underflow usually provide a fast +flush-to-0 alternative. This often suffices, though there are important +examples where it forces an uglier and slower coding style. A worse +situation is if the hardware uses system traps for Infinity and NaN +arithmetic. The resulting slowdown will make otherwise excellent and natural +algorithms run slowly +.]C "Demmel" . +Sadly, even some x86 implementations +that do non-finite arithmetic in hardware, do it relatively slowly. +.PP +We considered providing syntax to declare a certain program scope within +which precise IEEE behavior was required, and relaxing the rules outside +such scopes. +(The numerical C extensions +.]C "NumerC" +use pragma +for this purpose.) +These scope declarations would need to be in the +bytecodes, since significant optimization may be attempted by the runtime +compiler. After some discussion, and with some trepidation, it was agreed +that instead all compilers would be required to preserve the same result and +status as for an unoptimized version. +.NH 2 +Comparison +.PP +The standard C operators +.CW == +.CW != +.CW "<" +.CW "<=" +.CW ">" +.CW ">=" +are the only comparisons provided, and they behave exactly +like the ``math'' part of Table 4 of the IEEE standard. Programs interested +in handling NaN data should test explicitly. This seems to be the way most +people program and leads to code more understandable to nonexperts. It is +true that with more operators one can correctly write code that propagates +NaNs to a successful conclusion\-but that support has been left for later. +NaN(''tag'') should be added at that same time. +.NH 2 +Precision +.PP +All implementations run exclusively in IEEE double precision. If the +hardware has extra-precise accumulators, the round-to-double mode is set +automatically and not changeable, in keeping with Limbo's design to have +only one floating point type. Extended precision hardware, if available, may +be used by the built-in elementary function and +.SM BLAS +libraries. +Also, we contemplate adding a dotsharp function that would use a very long +accumulator for very precise inner products, independent of the order of +vector elements +.]C "kulisch" . +But reference implementations that use only +double precision, avoid contracted multiply-add, and evaluate in the order 1 +up to n will always be available for strict portability. +.PP +At the time the decision was made to restrict the system to 64-bit floating +point, Limbo integers were almost exclusively 32-bit and the consistency +argument to have a single real type was compelling. Now that Limbo has more +integer types the decision might be reconsidered. But so many engineers +needlessly struggle with programs run in short precision, that offering it +may do as much harm as good. On most modern computers used for general +purpose scientific computing, 64-bit floating point arithmetic is as fast as +32-bit, except for the memory traffic. In cases where the shorter precision +would suffice and memory is a crucial concern, the programmer should +consider carefully scaled fixed point or specialized compression. To +efficiently interoperate with data files that use the short format, +programmers may use the provided realbits32 function. While there are surely +appropriate uses for a first-class 32-bit real type, for now we follow +Kahan's sarcastic motto ``why use lead when gold will do?'' +.NH 2 +BLAS +.PP +The few +.SM BLAS +in the core library were chosen for readability and, +in case of gemm, for optimization beyond what a reasonable compiler would +attempt. We expect that compilers will (soon) be good enough that the +difference between compiling $y+=a*x$ and calling daxpy is small. Also, as +mentioned above, dot and gemm might reasonably use combined multiply-add or +a long accumulator in some optional implementations. +.NH 2 +$GAMMA ( x )$ +.PP +To avoid confusion with the C math library, which defined +.I "gamma" +as $ln GAMMA $, we offer only +.I "lgamma" +for now. This function and +.I "modf" +return an (int,real) tuple rather than assigning through an +integer pointer, in keeping with Limbo's design. The opportunity has been +taken to drop some obsolete functions like +.I "frexp" . +Other functions +are unchanged from the C math library. +.NH 2 +Future +.PP +A prototype preprocessor has been written to allow the scientific programmer +to write $A[i,j]$ for an $A$ that was created as a $Matrix(m,n)$ and to have +the subscript linearization done automatically. Here $Matrix$ is an Limbo +abstract data type containing a real array and integers $m$, $n$, and column +stride $lda$ used as in typical Fortran calling sequences. +.PP +The Limbo compiler is soon expected to implement the type +.I "complex" . +.PP +Higher level numerical libraries will also be provided, and although that +topic is beyond the scope of this paper, opinions about what should come +first would be welcome. +.PP +Distributed computing has not been mentioned here because it involves +relatively few considerations specific to floating point computation. +However, it may be worth noting that in the default environment (with +underflow trapped, so that presence or absence of denormalized numbers is +not significant) programs run independently on heterogeneous machines +nevertheless get precisely identical results, even with respect to thread +scheduling. This implies that certain communication steps can be avoided, +and that regression testing is considerably simplified. +.PP +Please direct comments on these numerical aspects of Inferno to Eric Grosse. +More general technical comments can be directed to Vita Nuova +.CW comments@vitanuova.com ). ( +I am grateful to Joe Darcy, Berkeley, +to David Gay, Bell Labs, to David Hook, University of Melbourne, +and to participants of the IFIP WG2.5 Working +Conference on Quality of Numerical Software for insightful comments on a +first draft of this paper. +.\"the principal developers of Inferno: Sean Dorward, Rob Pike, Dave Presotto, Howard Trickey, and Phil Winterbottom. +.SH +Trademarks +.LP +Inferno, Limbo, and Dis are trademarks of Vita Nuova Holdings Limited. +Unix is a trademark of Unix Systems Laboratories. +Windows95 is a trademark of Microsoft. +.EQ +delim off +.EN +.SH +References +.nr PS -1 +.nr VS -1 +.LP +.nr [N 0 1 +.de ]N +.IP \\n+([N. +.if \nZ>0 .tm \\$1 \\n([N +.. +.... +.... +.]N "Inferno" +S. Dorward, +R. Pike, +D.\ L. Presotto, +D.\ M. Ritchie, +H. Trickey, +P. Winterbottom, +``The Inferno Operating System'', +.I "Bell Labs Technical Journal" , +Vol. 2, +No. 1, +Winter 1997, pp. 5-18. +Reprinted in this volume. +.]N "Clinger" +W.\ D. Clinger. +``How to read floating point numbers accurately. +In \fIProceedings of the ACM SIGPLAN'90 Conference on Programming +Language Design and Implementation\fP, pages 92-101, 1990. +.... +.... +.]N "Demmel" +James\ W. Demmel and Xiaoye Li. +Faster numerical algorithms via exception handling. +In Jr. Earl\ Swartzlander, Mary\ Jane Irwin, and Graham Jullien, +editors, \fIProceedings: 11th Symposium on Computer Arithmetic\fP. IEEE +Computer Society Press, 1993. +.... +.... +.]N "blas" +Jack\ J. Dongarra, Jeremy\ Du Croz, Sven Hammarling, and Richard\ J. Hanson. +Algorithm 656: An extended set of Basic Linear Algebra Subprograms. +\fIACM Trans. on Mathematical Software\fP, 14(1):18-32, March 1988. +.... +.... +.]N "Gay" +D.\ M. Gay. +Correctly rounded binary-decimal and decimal-binary conversions. +Numerical Analysis Manuscript No. 90-10, AT&T Bell Laboratories, +Murray Hill, NJ, 1990. +freely redistributable, available at +.CW http://netlib.bell-labs.com/netlib/fp/ . +.... +.... +.]N "pine" +E.\ H. Grosse and W.\ M. Coughran, Jr. +The pine programming language. +Numerical Analysis Manuscript 83-4, AT&T Bell Laboratories, 1983. +.br +.CW ftp://cm.bell-labs.com/cm/cs/doc/92/pine.ps.Z . +.... +.... +.]N "IEEEfp" +IEEE. +Standard for binary floating-point arithmetic. +Technical Report Std 754-1985, ANSI, 1985. +.... +.... +.]N "Kagstrom" +Bo\ Kagstrom, Per Ling, and Charles Van\ Loan. +Portable high performance GEMM-based Level 3 BLAS. +In R.\ F.\ Sincovec et\ al., editor, \fIParallel Processing for +Scientific Computing\fP, pages 339-346. SIAM Publications, 1993. +.CW /netlib/blas/ . +.... +.... +.]N "Kahan" +W.\ Kahan. +Lecture notes on the status of IEEE Standard 754 for binary +floating-point arithmetic. +Technical report, Univ. Calif. Berkeley, May 23 1995. +Work in Progress. +.... +.... +.]N "kulisch" +U.\ Kulisch and W.L. Miranker. +\fIComputer arithmetic in theory and practice.\fP +Academic Press, 1980. +.... +.... +.]N "MacIlroy" +M.\ D. McIlroy. +Mass produced software components. +In Peter Naur and Brian Randell, editors, \fISoftware Engineering\fP, +pages 138-155, 1969. +Garmisch, Germany, October 1968. +.... +.... +.]N "fdlibm" +Kwok\ C. Ng. +.CW fdlibm : +C math library for machines that support ieee 754 +floating-point. +freely redistributable; available at +.CW http://netlib.bell-labs.com/netlib/fdlibm/ , +March 1995. +.... +.... +.]N "SteeleWhite" +G.\ L. Steele and J.\ L. White. +How to print floating point numbers accurately. +In \fIProceedings of the ACM SIGPLAN'90 Conference on Programming +Language Design and Implementation\fP, pages 112-126, 1990. +.... +.... +.]N "NumerC" +X3J11.1. +Chapter 5, floating-point C extensions. +Technical report, ANSI, March 29 1995. +.nr PS +1 +.nr VS +1 |
