summaryrefslogtreecommitdiff
path: root/doc/styx.ms
diff options
context:
space:
mode:
Diffstat (limited to 'doc/styx.ms')
-rw-r--r--doc/styx.ms737
1 files changed, 737 insertions, 0 deletions
diff --git a/doc/styx.ms b/doc/styx.ms
new file mode 100644
index 00000000..1fc7e560
--- /dev/null
+++ b/doc/styx.ms
@@ -0,0 +1,737 @@
+.ds TM \u\s-2TM\s+2\d
+.nr dT 6
+.nr XT 6
+.TL
+The Styx Architecture for Distributed Systems
+.AU
+Rob Pike
+Dennis M. Ritchie
+.AI
+Computing Science Research Center
+Lucent Technologies, Bell Labs
+Murray Hill, New Jersey
+USA
+.FS
+.FA
+Originally appeared in
+.I "Bell Labs Technical Journal" ,
+Vol. 4,
+No. 2,
+April-June 1999,
+pp. 146-152.
+.br
+Copyright © 1999 Lucent Technologies Inc. All rights reserved.
+.FE
+.AB
+A distributed system is constructed from a set of relatively
+independent components that form a unified, but geographically and
+functionally diverse entity. Examples include networked operating
+systems, Internet services, the national telephone
+switching system, and in general
+all the technology using today's diverse digital
+networks. Nevertheless, distributed systems remain difficult
+to design, build, and maintain, primarily because of the lack
+of a clean, perspicuous interconnection model for the
+components.
+.LP
+Our experience with two distributed operating systems,
+Plan 9 and Inferno, encourages us to propose such a model.
+These systems depend on, advocate, and generally push to the
+limit a fruitful idea: to present their
+resources as files in a hierarchical name space.
+The objects appearing as files may represent stored data, but may
+also be devices, dynamic information sources, interfaces to services,
+and control points. The approach unifies and provides basic naming,
+structuring, and access control mechanisms for all system resources.
+A simple underlying network protocol, Styx, forms
+the core of the architecture by presenting a common
+language for communication within the system.
+.LP
+Even within non-distributed systems, the presentation of services
+as files advantageously extends a familiar scheme for naming, classifying,
+and connecting to system resources.
+More important, the approach provides a natural way to build
+distributed systems, by using well-known technology for attaching
+remote file systems.
+If resources are represented as files,
+and there are remote file systems, one has
+a distributed system: resources available in one place
+are usable from another.
+.AE
+.SH
+Introduction
+.LP
+The Styx protocol is a variant of a protocol called
+.I 9P
+that
+was developed for the Plan 9 operating system[9man].
+For simplicity, we will use the name
+Styx throughout this paper; the difference concerns only the initialization of
+a connection.
+.LP
+The original idea behind Styx was to encode file operations between
+client programs and the file system,
+to be translated into messages for transmission on a computer network.
+Using this technology,
+Plan 9 separates the file server\(ema central repository for
+permanent file storage\(emboth from the CPU server\(ema large
+shared-memory multiprocessor\(emand from the user terminals.
+This physical separation of function was central to the original
+design of the system;
+what was unexpected was how well the model could be used to
+solve a wide variety of problems not usually thought of as
+file system issues.
+.LP
+The breakthrough was to realize that by representing
+a computing resource as a form of file system,
+many of the difficulties of making that resource available
+across the network would disappear naturally, because
+Styx could export the resource transparently.
+For example,
+the Plan 9 window system,
+.CW 8½
+[Pike91],
+is implemented as a dynamic file server that publishes
+files with names like
+.CW /dev/mouse
+and
+.CW /dev/screen
+to provide access to the local hardware.
+The
+.CW /dev/mouse
+file, for instance,
+may be opened and read like a regular file, in the manner of UNIX\*(TM device
+files, but under
+.CW 8½
+it is multiplexed: each client program has a private
+.CW /dev/mouse
+file that returns mouse events only when the client's window
+is the active one on the display.
+This design provides a clean, simple mechanism for controlling
+access to the mouse.
+Its real strength, though, is that the representation of the window system's
+resources as files allows Styx to make those resources available across the
+network.
+For example, an interactive graphics program may be run on a CPU server
+simply by having
+.CW 8½
+serve the appropriate files to that machine.
+.LP
+Note that although the resources published by Styx behave like files\(emthey
+have file names, file permissions, and file access methods\(emthey do not
+need to exist as standard files on disk.
+The
+.CW /dev/mouse
+file is accessed by standard file I/O mechanisms but is nonetheless a
+transient object fabricated dynamically by a running program;
+it has no permanent existence.
+.LP
+By following this approach throughout the system, Plan 9 achieves
+a remarkable degree of transparency in the distribution of resources[PPTTW93].
+Besides interactive graphics, services such as debugging, maintenance,
+file backup, and even access to the underlying network hardware
+can be made available across the network using Styx, permitting
+the construction of distributed applications and services
+using nothing more sophisticated than file I/O.
+.SH
+The Styx protocol
+.LP
+Styx's place in the world is analogous to
+Sun NFS[RFC][NFS] or Microsoft CIFS[CIFS], although it is simpler and easier to implement
+[Welc94].
+Furthermore, NFS and CIFS are designed for sharing regular disk files; NFS in particular
+is intimately tied to the implementation and caching strategy
+of the underlying UNIX file system.
+Unlike Styx, NFS and CIFS are clumsier at exporting dynamic device-like
+files such as
+.CW /dev/mouse .
+.LP
+Styx provides a view of a hierarchical, tree-shaped
+file system name space[Nee89], together with access information about
+the files (permissions, sizes, dates) and the means to read and write
+the files.
+Its users (that is, the people who write application programs),
+don't see the protocol itself; instead they see files that they
+read and write, and that provide information or change information.
+.LP
+In use, a Styx
+.I client
+is an entity on one machine that establishes communication with
+another entity, the
+.I server ,
+on the same or another machine.
+The client mechanisms may be built into the operating system, as they
+are in Plan 9 or Inferno[INF1][INF2], or into application libraries;
+a server may be part of the operating system, or just as often
+may be application code on a separate server machine. In any case, the
+client and server entities
+communicate by exchanging messages, and the effect is that the client
+sees a hierarchical file system that exists on the server.
+The Styx protocol is the specification of the messages that are exchanged.
+.LP
+At one level, Styx consists of messages of 13 types for
+.RS
+.IP \(bu
+Starting communication (attaching to a file system);
+.IP \(bu
+Navigating the file system (that is, specifying and
+gaining a handle for a named file);
+.IP \(bu
+Reading and writing a file; and
+.IP \(bu
+Performing file status inquiries and changes
+.RE
+.LP
+However, application writers simply code requests to open, read, or write
+files; a library or the operating system translates the requests
+into the necessary byte sequences transmitted over a communication
+channel. The Styx protocol proper specifies the interpretation of these
+byte sequences. It fits, approximately, at the OSI Session Layer level
+of the ISO standard classification.
+Its specification is independent of most details of machine architecture
+and it has been successfully used among machines of varying instruction
+sets and data layout.
+The protocol is summarized in Table 1.
+.KF
+.TS
+center box;
+l l
+--
+lfCW l.
+Name Description
+attach Authenticate user of connection; return FID
+clone Duplicate FID
+walk Advance FID one level of name hierarchy
+open Check permissions for file I/O
+create Create new file
+read Read contents of file
+write Write contents of file
+close Discard FID
+remove Remove file
+stat Report file state: permissions, etc.
+wstat Modify file state
+error Return error condition for failed operation
+flush Disregard outstanding I/O requests
+.TE
+.ce 100
+.ps -1
+Table 1. Summary of Styx messages.
+.ps
+.ce 0
+.KE
+.LP
+In use, an operation such as
+.P1
+open("/usr/rob/.profile", O_READ);
+.P2
+is translated by the underlying system into a sequence of Styx messages.
+After establishing the initial connection to the
+file server, an
+.CW attach
+message authenticates the user (the person or agent accessing the files) and
+returns an object called a
+.CW FID
+(file ID) that represents the root of the hierarchy on the server.
+When the
+.CW open()
+operation is executed, it proceeds as follows.
+.RS
+.IP \(bu
+A
+.CW clone
+message duplicates the root
+.CW FID ,
+returning a new
+.CW FID
+that can navigate the hierarchy without losing the connection to the root.
+.IP \(bu
+The new
+.CW FID
+is then moved to the file
+.CW /usr/rob/.profile
+by a sequence of
+.CW walk
+messages that step along, one path component at a time
+.CW usr , (
+.CW rob ,
+.CW .profile ).
+.IP \(bu
+Finally, an
+.CW open
+message checks that the user has permission to read the file,
+permitting subsequent
+.CW read
+and
+.CW write
+operations (messages) on the
+.CW FID .
+.IP \(bu
+Once I/O is completed, the
+.CW close
+message will release the
+.CW FID .
+.RE
+.LP
+At a lower level, implementations of Styx depend only on a reliable,
+byte-stream Transport communications layer. For example, it runs over either
+TCP/IP, the standard transmission control protocol
+and Internet protocol,
+or Internet link (IL), which is a sequenced, reliable datagram protocol
+using IP packets.
+It is worth emphasizing, though, that the model does not require the
+existence of a network to join the components; Styx runs fine
+over a Unix pipe or even using shared memory.
+The strength of the approach is not so much how it works over a network
+as that its behavior over a network is identical to its behavior locally.
+.SH
+Architectural approach
+.LP
+Styx, as a file system protocol, is merely a component in a
+more encompassing approach
+to system design: the presentation of resources as files.
+This approach will be discussed using a sequence of examples.
+.SH
+.I "Example: networking
+.LP
+As an example, access to a TCP/IP network in Inferno and Plan 9 systems
+appears as a piece of a file system, with (abbreviated) structure
+as follows[PrWi93]:
+.P1
+/net/
+ dns/
+ tcp/
+ clone
+ stats
+ 0/
+ ctl
+ status
+ data
+ listen
+ 1/
+ ...
+ ...
+ ether0/
+ 0/
+ ctl
+ status
+ ...
+ 1/
+ ...
+ ...
+.P2
+This represents a file system structure in which one can name, read, and write `files' with
+names like
+.CW /net/dns ,
+.CW /net/tcp/clone ,
+.CW /net/tcp/0/ctl
+and so on;
+there are directories of files
+.CW /net/tcp
+and
+.CW /net/ether0 .
+On the machine that actually has the network interface, all of these
+things that look like files are constructed by the kernel drivers that maintain
+the TCP/IP stack; they are not real files on a disk.
+Operations on the `files' turn into operations sent to the device drivers.
+.LP
+Suppose an application wishes to establish a connection over TCP/IP to
+.CW www.bell-labs.com .
+The first task is to translate the domain name
+.CW www.bell-labs.com
+to a numerical internet address; this is a complicated process, generally
+involving communicating with local and remote Domain Name Servers.
+In the Styx model, this is done by opening the file
+.CW /dev/dns
+and writing the literal string
+.CW www.bell-labs.com
+on the file; then the same file is read.
+It will return the string
+.CW 204.178.16.5
+as a sequence of 12 characters.
+.LP
+Once the numerical Internet address is acquired, the connection must be established;
+this is done by opening
+.CW /net/tcp/clone
+and reading from it a string that specifies a directory like
+.CW /net/tcp/43 ,
+which represents a new, unique TCP/IP channel.
+To establish the connection,
+write a message like
+.CW "connect 204.178.16.5
+on the control file for that connection,
+.CW /net/tcp/43/ctl .
+Subsequently, communication with
+.CW www.bell-labs.com
+is done by reading and
+writing on the file
+.CW /net/tcp/43/data .
+.LP
+There are several things to note about this approach.
+.RS
+.IP \(bu
+All the interface points look like files, and are
+accessed by the same I/O mechanisms already available in
+programming languages like C, C++, or Java. However, they do not
+correspond to ordinary data files on disk, but instead are creations
+of a middleware code layer.
+.IP \(bu
+Communication across the interface, by convention, uses printable character strings where
+feasible instead of binary information. This means that the syntax
+of communication does not depend on CPU architecture or language details.
+.IP \(bu
+Because the interface, as in this example with
+.CW /net
+as the interface with networking facilities, looks like a piece of a
+hierarchical file system, it can easily and nearly automatically
+be exported to a remote machine and used from afar.
+.RE
+.LP
+In particular, the Styx implementation encourages a natural way of providing
+controlled access to networks.
+Lucent, like many organizations, has an internal network not
+accessible to the international Internet, and has a few
+gateways between the inside and outside networks.
+Only the gateway machines are connected to both, and they implement
+the administrative controls for safety and security.
+The advantage of the Styx model is the ease with which
+the outside Internet can be used from inside.
+If the
+.CW /net
+file tree described above is provided on a gateway machine,
+it can be used as a remote file system from machines on the
+inside. This is safe, because this connection is one-way:
+inside machines can see the external network interfaces,
+but outside machines cannot see the inside.
+.SH
+.I "Example: debugging
+.LP
+A similar approach, borrowed and generalized from the UNIX
+system [Kill], is useful for controlling and discovering the status
+of the running processes in the operating system.
+Here a directory
+.CW /proc
+contains a subdirectory for each process running on the
+system; the names of the subdirectories correspond to
+process IDs:
+.P1
+/proc/
+ 1/
+ status
+ ctl
+ fd
+ text
+ mem
+ ...
+ 2/
+ status
+ ctl
+ ...
+ ...
+.P2
+The file names in the process directories refer to various aspects
+of the corresponding process:
+.CW status
+contains information about the state of the process;
+.CW ctl ,
+when written, performs operations like pausing, restarting,
+or killing the process;
+.CW fd
+names and describes the files open in the process;
+.CW text
+and
+.CW mem
+represent the program code and the data respectively.
+.LP
+Where possible, the information and control are again
+represented as text strings. For example, one line
+from the
+.CW status
+file of a typical process might be
+.DS
+.CW "samterm dmr Read 0 20 2478910 0 0 ...
+.DE
+which shows the name of the program, the owner, its state, and several numbers
+representing CPU time in various categories.
+.LP
+Once again, the approach provides several payoffs.
+Because process information is represented in file form,
+remote debugging (debugging programs on another machine)
+is possible immediately by remote-mounting the
+.CW /proc
+tree on another machine.
+The machine-independent representation of information means
+that most operations work properly even if the remote machine
+uses a different CPU architecture from the one doing the
+debugging.
+Most of the programs that deal
+with status and control contain no machine-dependent parts
+and are completely portable.
+(A few are not, however: no attempt is made to render the
+memory data or instructions in machine-independent form.)
+.SH
+.I "Example: PathStar\*(TM Access Server
+.LP
+The data shelf of Lucent's PathStar Access Server[PATH] uses Styx to connect
+the line cards and other devices on the shelf to the control computer.
+In fact, Styx is the protocol for high-level communication on the backplane.
+.LP
+The file system hierarchy served by the control computer includes a structure
+like this:
+.P1
+/trip/
+ config
+ admin/
+ ospfctl
+ ...
+ boot/
+ 0/
+ ctl
+ eeprom
+ memory
+ msg
+ pack
+ alarm
+ ...
+ 1/
+ ...
+/net/
+ ...
+.P2
+The directories under
+.CW /net
+are similar to those in Plan 9 or Inferno; they form the interface to the
+external IP network.
+The
+.CW /trip
+hierarchy represents the control structure of the shelf.
+.LP
+The subdirectories under
+.CW /trip/boot
+each provide access to one of the line cards or other devices in the shelf.
+For example, to initialize a card one writes the text string
+.CW reset
+to the
+.CW ctl
+file of the card, while bootstrapping is done by copying the control
+software for the card into the
+.CW memory
+file and writing a
+.CW reset
+message to
+.CW ctl .
+Once the line card is running,
+the other files present an interface to the higher-level structure of the device:
+.CW pack
+is the port through which IP packets are transferred to and from the card,
+.CW alarm
+may be read to discover outstanding conditions on the card, and so on.
+.LP
+All this structure is exported from the shelf using Styx.
+The external element management software (EMS) controls and monitors the
+shelf using Styx operations.
+For example, the EMS may read
+.CW /trip/boot/7/alarm
+and discover a diagnostic condition.
+By reading and writing the other files under
+.CW /trip/boot/7/ ,
+the card may be taken off line, diagnosed, and perhaps reset or substituted,
+all from the system running the EMS, which may be elsewhere in the network.
+.LP
+Another example is the implementation of SNMP in the PathStar Access Server.
+The functionality of SNMP is usually distributed through the various components
+of a network, but here it is a straightforward adaption process,
+running anywhere in the network, that translates SNMP requests to Styx
+operations in the network element.
+Besides dramatically simplifying the implementation, the natural
+ability for aggregation permits
+a single process to provide SNMP access to an arbitrarily complex network subsystem.
+Yet the structure is secure: the file-oriented nature of the operations make it
+easy to establish standard authentication and security controls to guarantee
+that only trusted parties have access to the SNMP operations.
+.LP
+There are local benefits to this architecture, as well.
+Styx provides a single point in the design where control can be separated
+from the details of the underlying fabric, isolating both from changes in the
+other. Components become more adaptable: software can be upgraded
+without worrying about hidden dependencies on the hardware,
+and new hardware may be installed without updating the control
+software above.
+.SH
+Security issues
+.LP
+Styx provides several security mechanisms for
+discouraging hostile or accidental actions that injure the integrity
+of a system.
+.LP
+The underlying file-communication protocol includes
+user and group identifiers that a server may check against
+other authentication.
+For example, a server may check, on a request to open a file,
+that the user ID associated with the request is permitted to
+perform the operation.
+This mechanism is familiar from general-purpose operating
+systems, and its use is well-known.
+It depends on passwords or stronger mechanisms for authenticating
+the identity of clients.
+.LP
+The Styx approach of providing remote resources
+as file systems over a network encourages genuinely secure access
+to the resources in a way transparent to applications, so that
+authentication transactions need not be provided as part of each.
+For example, in Inferno, the negotiation of an initial connection
+between client and server may include installation of any of
+several encrypting or message-digesting protocols that
+supervise the channel.
+All application use of the resources provided by the server
+is then protected against interference, and the server
+has strong assurance that its facilities are being used in
+an authorized way.
+This is relevant both for general-purpose file servers,
+and, in the telephony field, is especially useful for safe
+remote administration.
+.SH
+Summary
+.LP
+Presentation of resources as a piece of a possibly remote file system
+is an attractive way of creating distributed systems that treads a
+path between two extremes:
+.IP 1
+All communication with other parts of the system is by
+explicit messages sent between components.
+This communication differs in style from applications' use
+of local resources.
+.IP 2
+All communication is by means of
+closely shared resources: the CPU-addressable memory in
+various parts is made directly available across a big network;
+applications can read and write far-away objects exactly as
+they do those on the same motherboard as their own CPU.
+.LP
+Something like the first of these extremes is usually more evident
+in today's systems, although either the operating system or software
+layered upon it usually paper over some of the rough spots.
+The second remains more difficult to approach, because
+networks (especially big ones like the Internet) are not very
+reliable, and because
+the machines on them are diverse in processor architecture
+and in installed software.
+.LP
+The design plan described and advocated in this paper
+lies between the two extremes.
+It has these advantages:
+.IP \(bu
+.I "A simple, familiar programming model for reading and writing named files" .
+File systems have well-defined naming, access, and permissions structures.
+.IP \(bu
+.I "Platform and language independence" .
+Underlying access to resources is
+at the file level, which is provided nearly everywhere, instead
+of depending on facilities available only with particular languages
+or operating systems.
+C++ or Java classes, and C libraries can be constructed
+to access the facilities.
+.IP \(bu
+.I "A hierarchical naming and access control structure" .
+This encourages clean
+and well-structured design of resource naming and access.
+.IP \(bu
+.I "Easy testing and debugging" .
+By using well-specified, narrow interfaces
+at the file level, it is straightforward to observe the communication
+between distributed entities.
+.IP \(bu
+.I "Low cost" .
+Support software, at both client and server,
+can be written in a few thousand lines
+of code, and will occupy only small space in products.
+.LP
+This approach to building systems is successful in the general-purpose
+systems Plan 9 and Inferno;
+it has also been used to construct systems specialized for telephony, such
+as Mantra[MAN] and the PathStar Access Server.
+It supplies a coherent, extensible structure both to the internal communications
+within a single system and external communication between heterogeneous
+components of a large digital network.
+.LP
+.SH
+References
+.nr PS -1
+.nr VS -1
+.IP [NFS] 11
+R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and
+B. Lyon,
+``Design and Implementation of the Sun Network File System'',
+.I "Proc. Summer 1985 USENIX Conf." ,
+Portland, Oregon, June 1985,
+pp. 119-130.
+.IP [RFC] 11
+Internet RFC 1094.
+.IP [9man] 11
+.I "Plan 9 Programmer's Manual" ,
+Second Edition,
+Vol. 1 and 2,
+Bell Laboratories,
+Murray Hill, N.J.,
+1995.
+.IP [Kill84] 11
+T. J. Killian,
+``Processes as Files'',
+.I "Proc. Summer 1984 USENIX Conf." ,
+June 1984, Salt Lake City, Utah, June 1984, pp. 203-207.
+.IP [Pike91] 11
+R. Pike,
+``8½, the Plan 9 Window System'',
+.I "Proc. Summer 1991 USENIX Conf." ,
+Nashville TN, June 1991, pp. 257-265.
+.IP "[PPTTW93] " 11
+R. Pike, D.L. Presotto, K. Thompson, H. Trickey, and P. Winterbottom, ``The Use of Name Spaces in Plan 9'',
+.I "Op. Sys. Rev." ,
+Vol. 27, No. 2, April 1993, pp. 72-76.
+.IP [PrWi93] 11
+D. L. Presotto and P. Winterbottom,
+``The Organization of Networks in Plan 9'',
+.I "Proc. Winter 1993 USENIX Conf." ,
+San Diego, Calif., Jan. 1993, pp. 43-50.
+.IP [Nee89] 11
+R. Needham, ``Names'', in
+.I "Distributed systems" ,
+edited by S. Mullender,
+Addison-Wesley,
+Reading, Mass., 1989, pp. 89-101.
+.IP [CIFS]
+Paul Leach and Dan Perry, ``CIFS: A Common Internet File System'', Nov. 1996,
+.I "http://www.microsoft.com/mind/1196/cifs.htm" .
+.IP [INF1]
+.I "Inferno Programmer's Manual",
+Third Edition,
+Vol. 1 and 2, Vita Nuova Holdings Limited, York, England, 2000.
+.IP [INF2]
+S.M. Dorward, R. Pike, D. L. Presotto, D. M. Ritchie, H. Trickey,
+and P. Winterbottom, ``The Inferno Operating System'',
+.I "Bell Labs Technical Journal"
+Vol. 2,
+No. 1,
+Winter 1997.
+.IP [MAN]
+R. A. Lakshmi-Ratan,
+``The Lucent Technologies Softswitch\-Realizing the Promise of Convergence'',
+.I "Bell Labs Technical Journal" ,
+Vol. 4,
+No. 2,
+April-June 1999,
+pp. 174-196.
+.IP [PATH]
+J. M. Fossaceca, J. D. Sandoz, and P. Winterbottom,
+``The PathStar Access Server: Facilitating Carrier-Scale Packet Telephony'',
+.I "Bell Labs Technical Journal" ,
+Vol. 3,
+No. 4,
+October-December 1998,
+pp. 86-102.
+.IP [Welc94]
+B. Welch,
+``A Comparison of Three Distributed File System Architectures: Vnode, Sprite, and Plan 9'',
+.I "Computing Systems" ,
+Vol. 7, No. 2, pp. 175-199 (1994).
+.nr PS +1
+.nr VS +1