diff options
Diffstat (limited to 'man/6/sexprs')
| -rw-r--r-- | man/6/sexprs | 236 |
1 files changed, 236 insertions, 0 deletions
diff --git a/man/6/sexprs b/man/6/sexprs new file mode 100644 index 00000000..d2a5a7cc --- /dev/null +++ b/man/6/sexprs @@ -0,0 +1,236 @@ +.TH SEXPRS 6 +.SH NAME +sexprs \- symbolic expressions +.SH DESCRIPTION +S-expressions (`symbolic expressions') provide a way for programs to store and +exchange tree-structured text and binary data. +The Limbo module +.IR sexprs (2) +provides the variant defined by +Rivest in Internet Draft +.L draft-rivest-sexp-00.txt +(4 May 1997), +as used for instance by the Simple Public Key Infrastructure (SPKI). +It provides a `canonical' form of S-expression, +and an `advanced' form for display. +They can convey binary data directly and efficiently, unlike some +other schemes such as XML. +The two forms are closely related and all can be read or written by +.IR sexprs (2), +including a variant sometimes used for transport on links that are not 8-bit safe. +.PP +An S-expression is either a sequence of bytes (a byte +.IR string ), +or a parenthesised list of smaller S-expressions. +All forms start with the fundamental rules below, in extended BNF: +.IP +.EX +.ft R +.ta \w'\f2simple-stringxxxxx\f1'u +\w'\ ::=\ 'u +\f2sexpr\fP ::= \f2string\fP | \f2list\fP +\f2list\fP ::= '(' \f2sexpr\fP* ')' +.EE +.DT +.PD +.PP +They give the recursive structure. +The various representations ultimately differ only in how the byte string is represented +and whether white space such as blanks or newlines can appear. +.PP +Furthermore, the definition of +.I string +is also common to all forms: +.IP +.EX +.ft R +.ta \w'\f2simple-stringxxxxx\f1'u +\w'\ ::=\ 'u +\f2string\fP ::= \f2display\fP? \f2simple-string\fP +\f2display\fP ::= '[' \f2simple-string\fP ']' +.EE +.DT +.PD +.PP +The optional bracketed +.I display +string provides information on how to present the associated byte string to a user. +(``It has no other function. Many of the MIME types work here.'') +Although supported by +.IR sexprs (2), +it is largely unused by Inferno applications and is usually left out. +The canonical and advanced forms differ in their definitions of +.IR simple-string . +They always denote sequences of 8-bit bytes, but with different syntax (encodings). +Two +.I strings +are equal iff their +.I simple-strings +encode the same byte strings (for both data and +.IR display ). +.PP +.I Canonical +form must be used when exchanging S-expressions between computers, +and when digitally signing an expression. +It is defined by the complete set of rules below: +.IP +.EX +.ft R +.ta \w'\f2simple-stringxxxxx\f1'u +\w'\ ::=\ 'u +\f2sexpr\fP ::= \f2string\fP | \f2list\fP +\f2list\fP ::= '(' \f2sexpr\fP* ')' +\f2string\fP ::= \f2display\fP? \f2simple-string\fP +\f2display\fP ::= '[' \f2simple-string\fP ']' +\f2simple-string\fP ::= \f2raw\fP +\f2raw\fP ::= \f2nbytes\fP ':' \f2byte*\fP +\f2nbytes\fP ::= \f5[1-9][0-9]\fP+ | \f50\fP +.EE +.DT +.PD +.PP +Its +.I simple-string +is a raw byte string. +The primitive +.I byte +represents an 8-bit byte. +The length of every byte string is given explicitly by a preceding decimal value +.I nbytes +(with no leading zeroes). +There is no white space. +It is `canonical' because it is uniquely defined for each S-expression. +It is efficient to parse even on small computers. +.PP +.I Advanced +form is more elaborate, and has two main differences: +not all byte strings need an explicit length, and binary +data can be represented in printable form, either using hexadecimal or base 64 encodings, +or using quoted strings (with escape sequences similar to those of Limbo or C). +Unquoted text is called a +.IR token , +and is restricted by the standard to a specific alphabet: +it must contain only letters, digits, or characters from the set +.LR "-./_:*+=" , +and must not start with a digit. +The latter restriction is imposed to allow byte counts to be distinguished from tokens without +lookahead, but has the consequence that decimal numbers must be quoted, +as must non-ASCII characters in +.IR utf (6) +encoding. +Upper- and lower-case letters are distinct. +The advanced transport syntax is defined by the complete set of rules below: +.IP +.EX +.ft R +.ta \w'\f2simple-stringxxxxx\f1'u +\w'\ ::=\ 'u +\f2sexpr\fP ::= \f2string\fP | \f2list\fP +\f2list\fP ::= '(' ( \f2sexpr\fP | \f2whitespace\fP )* ')' +\f2string\fP ::= \f2display\fP? \f2simple-string\fP +\f2display\fP ::= '[' \f2simple-string\fP ']' +\f2simple-string\fP ::= \f2raw\fP | \f2token\fP | \f2base-64\fP | \f2hexadecimal\fP | \f2quoted-string\fP +\f2raw\fP ::= \f2nbytes\fP ':' \f2byte*\fP +\f2nbytes\fP ::= \f5[1-9][0-9]\fP+ | \f50\fP +\f2token\fP ::= \f2token-start\fP \f2token-char*\fP +\f2base-64\fP ::= \f2decimal\fP? '|' ( \f2base-64-char\fP | \f2whitespace\fP )* '|' +\f2hexadecimal\fP ::= '#' ( \f2hex-digit\fP | \f2whitespace\fP )* '#' +\f2quoted-string\fP ::= \f2nbytes\fP? \f2quoted-string-body\fP +\f2quoted-string-body\fP ::= '"' \f2byte*\fP '"' +\f2token-start\fP ::= \f5[-./_:*+=a-zA-Z]\fP +\f2token-char\fP ::= \f2token-start\fP | \f5[0-9]\fP +\f2hex-digit\fP ::= \f5[0-9a-fA-F]\fP +\f2base-64-char\fP ::= \f5[a-zA-Z0-9+/=]\fP +.EE +.PD +.DT +.PP +.I Whitespace +is any sequence of blank, tab, newline or carriage-return characters; +note that it can appear only at the places shown. +The +.I bytes +in a +.I quoted-string-body +are interpreted according to the quoting rules for Limbo (or C). +That is, the bytes are enclosed in quotes, and may contain the +escape sequences for the following characters: +backspace +.RB ( \eb ), +form-feed +.RB ( \ef ), +newline +.RB ( \en ), +carriage-return +.RB ( \er ), +tab +.RB ( \et ), +and vertical tab +.RB ( \ev ), +octal escape +.BI \e ooo +(all three digits must be given), +hexadecimal escape +.BI \ex hh +(both digits must be given), +.B \e\e +for backslash, +.B \e' +for single quote, and +and \f5\e"\fP to include a quote in a string. +Note that a quoted string can have an optional +.IR nbytes , +but it gives the length of the byte string resulting +.I after +interpreting character escapes. +.PP +Both canonical and advanced forms can contain binary data verbatim. +Sometimes that is troublesome for storage or transport. +At the lexical level any +.I sexpr +can therefore be replaced by the following: +.IP +.EX +.ft R +\&'{' ( \f2base-64-char\fP | \f2whitespace\fP )* '}' +.EE +.PP +where the text between the braces is the base-64 encoding of the +.I sexpr +expressed in canonical or advanced form. +The S-expression parser will replace the sequence by its decoded, and resume +parsing at the start of that byte string. +Note the difference in syntax and interpretation from rule +.IR base-64 +above, which encodes a +.IR simple-string , +not an +.IR sexpr . +.SH EXAMPLES +The following S-expression is in canonical form: +.IP +.EX +(12:hello world!(5:inner0:)) +.EE +.PP +It is a list of two elements: the string +.BR "hello world!" , +and another list also with two elements, +the string +.BR inner +and an empty string. +All the bytes in the example are printable characters, but they could have been arbitrary binary values. +.PP +The following is an S-expression in advanced form: +.IP +.EX +(hello-world + (* "3" "5.6") + (best-of-3 (5:inner0:))) +.EE +.PP +Note that advanced form contains canonical form as a subset; +here it is used for the innermost list. +.SH SEE ALSO +.IR sexprs (2) +.PP +R. Rivest, ``S-expressions'', Network Working Group Internet Draft +(4 May 1997), +reproduced in +.BR /lib/sexp . |
