From 46439007cf417cbd9ac8049bb4122c890097a0fa Mon Sep 17 00:00:00 2001 From: "Charles.Forsyth" Date: Fri, 22 Dec 2006 20:52:35 +0000 Subject: 20060303-partial --- doc/realinferno/real.ms | 611 +++++++++++++++++++++++++++++++++++++++++++++++ doc/realinferno/real.pdf | Bin 0 -> 86736 bytes 2 files changed, 611 insertions(+) create mode 100644 doc/realinferno/real.ms create mode 100644 doc/realinferno/real.pdf (limited to 'doc/realinferno') diff --git a/doc/realinferno/real.ms b/doc/realinferno/real.ms new file mode 100644 index 00000000..e2174dba --- /dev/null +++ b/doc/realinferno/real.ms @@ -0,0 +1,611 @@ +.FP palatino +.de ]C +\&[\\$1]\\$2 +.. +.if \nZ=0 .so real.ref +.EQ +delim $$ +.EN +.TL +Real Inferno +.AU +.I "Eric Grosse" +.AI +.I "Lucent Technologies, Bell Labs" +.I "Murray Hill NJ 07974 USA" +.I "ehg@bell-labs.com" +.\"date{19 Aug 1996, minor revisions 7 Jan 1998} +.FS +Previously appeared in R.F. Boisvert (editor), +.ft I +The Quality of Numerical Software: Assessment and +Enhancement: Proceedings of the IFIP TC2/WG2.5 +Working Conference on the Quality of Numerical +Software, Oxford, +United Kingdom, 8-12 July 1996, +.ft R +Chapman Hall, +London, +1997 (pp. 270-279). +.FE +.AB +Inferno is an operating system well suited to applications that need to be +portable, graphical, and networked. This paper describes the fundamental +floating point facilities of the system, including: tight rules on +expression evaluation, binary/decimal conversion, exceptions and rounding, +and the elementary function library. +.AE +.PP +Although the focus of Inferno is interactive media, its portability across +hardware and operating platforms, its relative simplicity, and its strength +in distributed computing make it attractive for advanced scientific +computing as well. Since the appearance of a new operating system is a +relatively uncommon event, this is a special opportunity for numerical +analysts to voice their opinion about what fundamental facilities they need. +The purpose of this short paper is to describe numerical aspects of the +initial release of Inferno, and to invite comment before the tyranny of +backward compatibility makes changes impossible. +.PP +An overview of Inferno is given by Dorward et. al. +.]C "Inferno" , +but for our immediate purposes it may suffice to say that Inferno plays the +role of a traditional operating system (with compilers, process control, +networking, graphics, and so on) but can run either on bare hardware or on +top of another operating system like Windows95 or Unix. Programs for +.I "Inferno" +are written in the language +.I "Limbo" +and compiled to +machine-independent object files for the +.I "Dis" +virtual +machine, which is then implemented with runtime compilation for best +performance. Files are accessible over networks using the +.I "Styx" +protocol; together with the presentation of most system resources as files +and the manipulation of file namespaces, this permits integration of a +collection of machines into a team. Limbo looks somewhat like a mixture of C +and Pascal, augmented by modules (to cope with the namespace and dynamic +loading needs of large programs) and by a channel facility for convenient +(coarse-grain) parallel programing. Array references are bounds-checked and +memory is garbage collected. +.PP +The rest of this paper covers the fundamental floating point environment +provided by the Limbo compiler and +.I "math" +module, the ``elementary +functions,'' and finally some comments on why particular definitions were +chosen or why certain facilities were included or excluded. This discussion +assumes the reader is familiar with scientific computing in general and the +IEEE floating point standard in particular. +.NH 1 +Floating point +.PP +In Limbo, arithmetic on literal and named constants is evaluated at compile +time with all exceptions ignored. Arithmetic on variables is left by the +compiler to runtime, even if data path analysis shows the value to be a +compile time constant. This implies that tools generating Limbo source must +do their own simplification, and not expect the compiler to change $x/x$ +into $1$, or $-(y-x)$ into $x-y$, or even $x-0$ into $x$. Negation $-x$ +changes the sign of $x$; note that this not the same as $0-x$ if $x=0$. +.PP +The compiler may perform subexpression elimination and other forms of code +motion, but not across calls to the mode and status functions. It respects +parentheses. The evaluation order of $a+b+c$ follows the parse tree and is +therefore the same as for $(a+b)+c$. These are the same rules as for Fortran +and C. +.PP +Contracted multiply-add instructions (with a single rounding) are not +generated by the compiler, though they may be used in the native +.SM BLAS +libraries. All arithmetic follows the IEEE floating point standard +.]C "IEEEfp" , +except that denormalized numbers may not be supported; see the +discussion in section 3. +.PP +The most important numerical development at the language level recently has +been accurate binary/decimal conversion +.]C "Clinger" +.]C "Gay" +.]C "SteeleWhite" . +Thus printing a real using +.CW "\%g" +and reading +it on a different machine guarantees recovering identical bits. (Limbo uses +the familiar +.I "printf" +syntax of C, but checks argument types against +the format string at compile time, in keeping with its attempt to help the +programmer by stringent type checking.) A good +.I "strtod/dtoa" +is, +unfortunately, 1700 lines of source (15kbytes compiled), though with modest +average runtime penalty. This code must be used in the compiler so that +coefficients are accurately transferred to bytecodes. Smaller, faster, but +sloppier, runtimes will also be provided when mandated by limited memory and +specialized use. However, programmers may assume the features described in +this paper are present in all Inferno systems intended for general computing. +.PP +Each thread has a floating point control word (governing rounding mode and +whether a particular floating point exception causes a trap) and a floating +point status word (storing accumulated exception flags). Functions +.I "FPcontrol" +and +.I "FPstatus" +copy bits to the control or status word, in +positions specified by a mask, returning previous values of the bits. +.I "getFPcontrol" +and +.I "getFPstatus" +return the words unchanged. +.PP +The constants +.I "INVAL, ZDIV, OVFL, UNFL, INEX" +are non-overlapping +single-bit masks used to compose arguments or return values. They stand for +the five IEEE exceptions: +.IP • +``invalid operation'' ($0/0$,$0 * infinity $,$ infinity - infinity $,$sqrt{-1}$) +.IP • +``division by zero'' ($1/0$), +.IP • +``overflow'' ($1.8e308$) +.IP • +``underflow'' ($1.1e-308$) +.IP • +``inexact'' ($.3*.3$). +.PP +The constants +.I "RND_NR, RND_NINF, RND_PINF, RND_Z" +are distinct +bit patterns for ``round to nearest even'', ``round toward $-{infinity} $'', +``round toward $+{infinity} $'', ``round toward $0$'', any of which can be set +or extracted from the floating point control word using +.I "RND_MASK" . +For example, +.IP • +to arrange for the program to tolerate underflow, +.I "FPcontrol(0,UNFL)." +.IP • +to check and clear the inexact flag, +.I "FPstatus(0,INEX)." +.IP • +to set directed rounding, +.I "FPcontrol(RND_PINF,RND_MASK)." +.PP +By default, +.I "INEX" +is quiet and +.I "OVFL, UNFL, ZDIV," +and +.I "INVAL" +are fatal. By default, rounding is to nearest even, and library +functions are entitled to assume this. Functions that wish to use quiet +overflow, underflow, or zero-divide should either set and restore the +control register themselves or clearly document that the caller must do so. +The ``default'' mentioned here is what a Limbo program gets if started in a +fresh environment. Threads inherit floating point control and status from +their parent at the time of spawning and therefore one can spawn a ``round +toward 0'' shell and re-run a program to effortlessly look for rounding +instabilities in a program. +.NH 1 +Elementary functions +.PP +The constants +.I "Infinity, NaN, MachEps, Pi, Degree" +are defined. Since +Inferno has thorough support of Unicode, it was tempting to name these $infinity $, $ε $, $π $, and °, but people (or rather, their +favorite text editing tools) may not be ready yet for non-\s-2ASCII\s0 +source text. +.I "Infinity" +and +.I "NaN" +are the positive infinity +and quiet not-a-number of the IEEE standard, double precision. +.I MachEps +is $2 sup {-52}$, the unit in the last place of the mantissa $1.0$. +The value of +.I "Pi" +is the nearest machine number to the +mathematical value $π $. +.I "Degree" +is +$"Pi" / 180$. +.PP +Three useful functions +.I "fdim, fmax, fmin" +are adopted from the +Numerical C extensions +.]C "NumerC" . +The unusual one of these, often +denoted $(x-y) sub {+}$, is defined by $roman "fdim" ( x , y )=x-y$ if $x > y$, else $0$. The compiler may turn these into efficient machine instruction sequences, +possibly even branch-free, rather than function calls. There are two almost +redundant mod functions: +.I "remainder(x,y)" +is as defined by the IEEE +standard (with result $roman "|" r roman "|" <= y/2$); +.I "fmod(x,y)" +is $x roman "mod" y$, +computed in exact arithmetic with $0<= r" +.CW ">=" +are the only comparisons provided, and they behave exactly +like the ``math'' part of Table 4 of the IEEE standard. Programs interested +in handling NaN data should test explicitly. This seems to be the way most +people program and leads to code more understandable to nonexperts. It is +true that with more operators one can correctly write code that propagates +NaNs to a successful conclusion\-but that support has been left for later. +NaN(''tag'') should be added at that same time. +.NH 2 +Precision +.PP +All implementations run exclusively in IEEE double precision. If the +hardware has extra-precise accumulators, the round-to-double mode is set +automatically and not changeable, in keeping with Limbo's design to have +only one floating point type. Extended precision hardware, if available, may +be used by the built-in elementary function and +.SM BLAS +libraries. +Also, we contemplate adding a dotsharp function that would use a very long +accumulator for very precise inner products, independent of the order of +vector elements +.]C "kulisch" . +But reference implementations that use only +double precision, avoid contracted multiply-add, and evaluate in the order 1 +up to n will always be available for strict portability. +.PP +At the time the decision was made to restrict the system to 64-bit floating +point, Limbo integers were almost exclusively 32-bit and the consistency +argument to have a single real type was compelling. Now that Limbo has more +integer types the decision might be reconsidered. But so many engineers +needlessly struggle with programs run in short precision, that offering it +may do as much harm as good. On most modern computers used for general +purpose scientific computing, 64-bit floating point arithmetic is as fast as +32-bit, except for the memory traffic. In cases where the shorter precision +would suffice and memory is a crucial concern, the programmer should +consider carefully scaled fixed point or specialized compression. To +efficiently interoperate with data files that use the short format, +programmers may use the provided realbits32 function. While there are surely +appropriate uses for a first-class 32-bit real type, for now we follow +Kahan's sarcastic motto ``why use lead when gold will do?'' +.NH 2 +BLAS +.PP +The few +.SM BLAS +in the core library were chosen for readability and, +in case of gemm, for optimization beyond what a reasonable compiler would +attempt. We expect that compilers will (soon) be good enough that the +difference between compiling $y+=a*x$ and calling daxpy is small. Also, as +mentioned above, dot and gemm might reasonably use combined multiply-add or +a long accumulator in some optional implementations. +.NH 2 +$GAMMA ( x )$ +.PP +To avoid confusion with the C math library, which defined +.I "gamma" +as $ln GAMMA $, we offer only +.I "lgamma" +for now. This function and +.I "modf" +return an (int,real) tuple rather than assigning through an +integer pointer, in keeping with Limbo's design. The opportunity has been +taken to drop some obsolete functions like +.I "frexp" . +Other functions +are unchanged from the C math library. +.NH 2 +Future +.PP +A prototype preprocessor has been written to allow the scientific programmer +to write $A[i,j]$ for an $A$ that was created as a $Matrix(m,n)$ and to have +the subscript linearization done automatically. Here $Matrix$ is an Limbo +abstract data type containing a real array and integers $m$, $n$, and column +stride $lda$ used as in typical Fortran calling sequences. +.PP +The Limbo compiler is soon expected to implement the type +.I "complex" . +.PP +Higher level numerical libraries will also be provided, and although that +topic is beyond the scope of this paper, opinions about what should come +first would be welcome. +.PP +Distributed computing has not been mentioned here because it involves +relatively few considerations specific to floating point computation. +However, it may be worth noting that in the default environment (with +underflow trapped, so that presence or absence of denormalized numbers is +not significant) programs run independently on heterogeneous machines +nevertheless get precisely identical results, even with respect to thread +scheduling. This implies that certain communication steps can be avoided, +and that regression testing is considerably simplified. +.PP +Please direct comments on these numerical aspects of Inferno to Eric Grosse. +More general technical comments can be directed to Vita Nuova +.CW comments@vitanuova.com ). ( +I am grateful to Joe Darcy, Berkeley, +to David Gay, Bell Labs, to David Hook, University of Melbourne, +and to participants of the IFIP WG2.5 Working +Conference on Quality of Numerical Software for insightful comments on a +first draft of this paper. +.\"the principal developers of Inferno: Sean Dorward, Rob Pike, Dave Presotto, Howard Trickey, and Phil Winterbottom. +.SH +Trademarks +.LP +Inferno, Limbo, and Dis are trademarks of Vita Nuova Holdings Limited. +Unix is a trademark of Unix Systems Laboratories. +Windows95 is a trademark of Microsoft. +.EQ +delim off +.EN +.SH +References +.nr PS -1 +.nr VS -1 +.LP +.nr [N 0 1 +.de ]N +.IP \\n+([N. +.if \nZ>0 .tm \\$1 \\n([N +.. +.... +.... +.]N "Inferno" +S. Dorward, +R. Pike, +D.\ L. Presotto, +D.\ M. Ritchie, +H. Trickey, +P. Winterbottom, +``The Inferno Operating System'', +.I "Bell Labs Technical Journal" , +Vol. 2, +No. 1, +Winter 1997, pp. 5-18. +Reprinted in this volume. +.]N "Clinger" +W.\ D. Clinger. +``How to read floating point numbers accurately. +In \fIProceedings of the ACM SIGPLAN'90 Conference on Programming +Language Design and Implementation\fP, pages 92-101, 1990. +.... +.... +.]N "Demmel" +James\ W. Demmel and Xiaoye Li. +Faster numerical algorithms via exception handling. +In Jr. Earl\ Swartzlander, Mary\ Jane Irwin, and Graham Jullien, +editors, \fIProceedings: 11th Symposium on Computer Arithmetic\fP. IEEE +Computer Society Press, 1993. +.... +.... +.]N "blas" +Jack\ J. Dongarra, Jeremy\ Du Croz, Sven Hammarling, and Richard\ J. Hanson. +Algorithm 656: An extended set of Basic Linear Algebra Subprograms. +\fIACM Trans. on Mathematical Software\fP, 14(1):18-32, March 1988. +.... +.... +.]N "Gay" +D.\ M. Gay. +Correctly rounded binary-decimal and decimal-binary conversions. +Numerical Analysis Manuscript No. 90-10, AT&T Bell Laboratories, +Murray Hill, NJ, 1990. +freely redistributable, available at +.CW http://netlib.bell-labs.com/netlib/fp/ . +.... +.... +.]N "pine" +E.\ H. Grosse and W.\ M. Coughran, Jr. +The pine programming language. +Numerical Analysis Manuscript 83-4, AT&T Bell Laboratories, 1983. +.br +.CW ftp://cm.bell-labs.com/cm/cs/doc/92/pine.ps.Z . +.... +.... +.]N "IEEEfp" +IEEE. +Standard for binary floating-point arithmetic. +Technical Report Std 754-1985, ANSI, 1985. +.... +.... +.]N "Kagstrom" +Bo\ Kagstrom, Per Ling, and Charles Van\ Loan. +Portable high performance GEMM-based Level 3 BLAS. +In R.\ F.\ Sincovec et\ al., editor, \fIParallel Processing for +Scientific Computing\fP, pages 339-346. SIAM Publications, 1993. +.CW /netlib/blas/ . +.... +.... +.]N "Kahan" +W.\ Kahan. +Lecture notes on the status of IEEE Standard 754 for binary +floating-point arithmetic. +Technical report, Univ. Calif. Berkeley, May 23 1995. +Work in Progress. +.... +.... +.]N "kulisch" +U.\ Kulisch and W.L. Miranker. +\fIComputer arithmetic in theory and practice.\fP +Academic Press, 1980. +.... +.... +.]N "MacIlroy" +M.\ D. McIlroy. +Mass produced software components. +In Peter Naur and Brian Randell, editors, \fISoftware Engineering\fP, +pages 138-155, 1969. +Garmisch, Germany, October 1968. +.... +.... +.]N "fdlibm" +Kwok\ C. Ng. +.CW fdlibm : +C math library for machines that support ieee 754 +floating-point. +freely redistributable; available at +.CW http://netlib.bell-labs.com/netlib/fdlibm/ , +March 1995. +.... +.... +.]N "SteeleWhite" +G.\ L. Steele and J.\ L. White. +How to print floating point numbers accurately. +In \fIProceedings of the ACM SIGPLAN'90 Conference on Programming +Language Design and Implementation\fP, pages 112-126, 1990. +.... +.... +.]N "NumerC" +X3J11.1. +Chapter 5, floating-point C extensions. +Technical report, ANSI, March 29 1995. +.nr PS +1 +.nr VS +1 diff --git a/doc/realinferno/real.pdf b/doc/realinferno/real.pdf new file mode 100644 index 00000000..670abcc5 Binary files /dev/null and b/doc/realinferno/real.pdf differ -- cgit v1.2.3