1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
|
.TH XML 2
.SH NAME
xml \- XML navigation
.SH SYNOPSIS
.EX
include "xml.m";
xml := load Xml Xml->PATH;
Parser, Item, Location, Attributes, Mark: import xml;
init: fn(): string;
open: fn(f: string, warning: chan of (Locator, string),
preelem: string): (ref Parser, string);
Parser: adt {
fileoffset: int;
next: fn(p: self ref Parser): ref Item;
down: fn(p: self ref Parser);
up: fn(p: self ref Parser);
mark: fn(p: self ref Parser): ref Mark;
atmark: fn(p: self ref Parser, m: ref Mark): int;
goto: fn(p: self ref Parser, m: ref Mark);
str2mark: fn(p: self ref Parser, s: string): ref Mark;
};
Item: adt {
fileoffset: int;
pick {
Tag =>
name: string;
attrs: Attributes;
Text =>
ch: string;
ws1: int;
ws2: int;
Process =>
target: string;
data: string;
Doctype =>
name: string;
public: int;
params: list of string;
Stylesheet =>
attrs: Attributes;
Error =>
loc: Locator;
msg: string;
}
};
Locator: adt {
line: int;
systemid: string;
publicid: string;
};
Attribute: adt {
name: string;
value: string;
};
Attributes: adt {
all: fn(a: self Attributes): list of Attribute;
get: fn(a: self Attributes, name: string): string;
};
Mark: adt {
offset: int;
str: fn(m: self ref Mark): string;
};
.EE
.SH DESCRIPTION
.B Xml
provides an interface for navigating XML files (`documents'). Once loaded, the module
must first be initialised by calling
.BR init .
A new parser instance is created by calling
.BR open(\fIf\fP,\ \fIwarning\fP,\ \fIpreelem\fP) ,
which opens the file
.I f
for parsing as an XML document.
It returns a tuple, say
.RI ( p ,\ err ).
If there is an error opening the document,
.I p
is nil, and
.I err
contains a description of the error; otherwise
.I p
can be used to examine the contents of the document.
If
.I warning
is not nil, non-fatal errors encountered when parsing
will be sent on this channel - a separate process will
be needed to received them. Each error is represented
by a tuple, say
.RI ( loc ,\ msg ),
containing the location
.IR loc ,
and the description,
.IR msg ,
of the error encountered. One XML tag,
.IR preelem ,
may be marked for special treatment by the XML parser:
within this tag all white space will be passed through as-is.
.PP
Once an XML document has been opened, the following
.B Parser
methods may be used to examine the items contained within:
.TP 10
.IB p .next()
An XML document is represented by a tree-structure.
.B Next
returns the next item in the document at the current level of the tree
within the current parent element. If there are no more such
items, it returns
.BR nil .
.TP
.IB p .down()
.B Down
descends into the element that has just been returned by
.BR next ,
which should be a
.B Tag
item. Subsequent items returned by
.B next
will be those within that tag.
.TP
.IB p .up()
.B Up
moves up one level in the XML tree.
.TP
.IB p .mark()
.B Mark
returns a mark that can be used to return later to the current
position in the document. The underlying file must
be seekable for this to work.
.TP
.IB p .goto(\fIm\fP)
Goes back to a previously marked position,
.IR m ,
in the document.
.TP
.IB p .atmark(\fIm\fP)
.B Atmark
returns non-zero if the current
position in the document is the same as that marked by
.IR m .
The current tree level is ignored in the comparison.
.TP
.IB p .str2mark(\fIs\fP)
.B Str2mark
turns a string as created by
.B Mark.str
back into a mark as returned by
.BR Parser.mark .
.SS Items
Various species of items live in XML documents; they are encapsulated
in the
.B Item
adt. This contains one member in common to all its subtypes:
.BR fileoffset ,
the position in the XML document of the start of the item.
The various kinds of item are as follows:
.TP
.B Tag
A generic XML tag.
.B Name
names the tag, and
.B attrs
holds its attributes, if any.
.TP
.B Text
.B Text
represents inline text in the XML document.
With the exception of text inside the tag named by
.I preelem
in
.BR open ,
any runs of white space are compressed to a single space,
and white space at the start or end of the text is elided.
.B Ch
contains the resulting text;
.B ws1
and
.B ws2
are non-zero if there was originally white space at the start
or end of the text respectively.
.TP
.B Process
.B Process
represents an XML document processing directive.
.B Target
is the processing instruction's target, and
.B data
holds the rest of the text inside the directive.
XML stylesheet directives are recognised directly and have
their own item type.
.TP
.B Doctype
.B Doctype
should only occur at the start of an xml document,
and represents the type of the XML document.
.TP
.B Stylesheet
.B Stylesheet
represents an XML stylesheet processing request. The
data of the processing request is parsed as per the RFC
into attribute-value pairs.
.TP
.B Error
If an unrecoverable error occurs processing the document,
an
.B Error
item is returned holding the location
.RB ( loc ),
and description
.RB ( msg )
of the error.
This will be the last item returned by the parser.
.PP
The attribute-value pairs in
.B Tag
and
.B Stylesheet
items are held in an
.B Atttributes
adt, say
.IR a .
.IB A .all()
yields a list holding all the attributes;
.IB a .get( name )
yields the value of the attribute
.IR name .
.PP
The location returned when an error is reported is held
inside a
.B Locator
adt, which holds the line number on which the error occurred,
the ``system id'' of the document (in this implementation, its file name),
and the "public id" of the document (not currently used).
.PP
A
.B Mark
.I m
may be converted to a string with
.IB m .str()\fR;\fP
this enables marks to be written out to external storage, to index
a large XML document, for example.
Note that if the XML document changes, any stored marks will
no longer be valid.
.SH SOURCE
.B /appl/lib/xml.b
.SH SEE ALSO
``Extensible Markup Language (XML) 1.0 (Second Edition)'',
.B http://www.w3.org/TR/REC-xml
.SH BUGS
XML's definition makes it tricky to handle leading and trailing white space
efficiently;
.B ws1
and
.B ws2
in
.B Item.Text
is the current compromise.
|