Syntax (XBNF) In this class we will use an ASCII based notation for discrete mathematics called MATHS. It has a special subset - XBNF - designed for describing syntax. XBNF is an extended form of EBNF which is itself and Extended form of BNF! We will use space, bar ('|'), and the symbol '::=' as in BNF. We will '()' as parentheses and '#' to mean 'any number of'. This choice follows more than 10 years of testing other forms in class and on computers. The EBNF symbols '<>{}[]' are used for other purposes like comparisons, subscripts and sets in MATHS. Here is comparison of BNF, Ada EBNF and XBNF. BNF ::=| Ada EBNF number::= digit {digit} XBNF number::= digit #digit. You can see which is shorter. XBNF is more powerful than EBNF as well. A Mapping into English "::=" is read "is defined to be" "&" is read "and also" "~" is read as "but not" "|" is read as "or" "#" is read as "any number of " "(" is read as "a sequence of" ")" is read as ", end of sequence" space between 2 parts in a sequences can be read as "and then" "_" becomes a space. Here is a set of definitions that defines a XBNF grammar. grammar ::= #( definition | comment ), A grammar has any number of definitions and comments. definition ::= defined_term "::=" set_expression, A definition has a "::=" between the defined term and an expression defining a set. set_expression::= item #("&" item ), The ampersand("&") character indicates that all the descriptions (items) must hold at once (set intersection). item ::= element | defined_term | "("selection ")", selection ::= alternative #( "|" alternative ), alternative ::= complementary_form | sequence A selection is made of alternatives(set union), where each alternative is either a complementary_form or a sequence of successive phases. sequence ::= #phase , phase ::= ("#" |) item, The 'number sign'(#) indicates the occurrence of any number of the following item - including none. An optional x is shown as (x |) complementary_form::= item "~" item In a complementary form ( A~B say) then any A which is not a B is in A~B. comment ::= (#character )~definition Any number of characters that are not definition form a comment. This sentence is therefore part of a comment in this grammar. Here is the definition of what an element or terminal looks like - its actually a C string. element::=quote #(char~(quote| backslash)|backslash char) quote quote::="\"" backslash::="\\" Here is a more complex definition the defines what a defined term can be: defined_term::= ( (letter | digit) #identifier_character) & ( #identifier_character (letter|digit)) & correctly_spelled & defined identifier_character::=(letter | digit | underscore) A defined_term does not start or end with an underscore - an identifier is made of letters, digits and underscore characters, and must always start and end with either a letter or digit. A defined term is made of numbers, words and the underscore character: correctly_spelled::=#(word | number | underscore) Words are at least two letters long. word::=(letter letter #letter) Notice that a defined_term's definition includes the item '...& defined' which indicates a special non-syntactic constraint - that a defined_term must be in the set of terms which are defined. Precedence of Operators. The above definitions imply the following consequences and interpretations: Concatenation (space) has a higher precedence than a vertical bar ("|"). In an EBNF formula with both spaces and "|" symbols the bars separate the sequences. For example, If a, b, c and d are items, then ( a b | c d ) is read as "a sequence of a and then b, end of sequence or a sequence of c and then d, end of sequence". The number sign("#") has lower precedence than a space and so always applies to the next item. Thus: file1::=header #data_record sentinel, indicates one header, then a number of data_record, and then one sentinel. It does not allow the header to re-occur. Neither can sentinel appear other than once in each file. However, if file2::=#(header data_record) sentinel, then there can be any number of header's but each header is followed by an data_record and there is still exactly one sentinel in an file. The next description file3::=#(header data_record sentinel) implies that headers and sentinels can occur many times - but always with an data_record in between them. Further in any file there will be the same number of sentinel's, data_record's and header's because the "#" does not permit a part of the repeated sequence to appear with out the rest of it also following. file4::=#(header #data_record sentinel) implies that headers and sentinel's can occur many times and with zero or more data_record's in between them. Similarly, If a and b are items then (a | #b ) indicates either an a or any number of b 's, as does (#b |a ). This, plus the rule relating spaces and the "|" symbol means that (a # b c | #b ) describes a string with a single a followed by many b 's and one c or else many b 's alone. Parentheses (()) are put around selections within sequences, and iterations. (a | b) (c | d) reads `a sequence of a or b, end of sequence and then a sequence of c or d, end of sequence,` Or informally `the first item is either an a or a b followed by either a c or a d.` Notice that this describes four alternative forms because: |- (a | b ) (c | d) = (a c | a d | b c | b d). Similarly #(a | b ) means any number of a's and b's in some order. However #(a b ) means any number of pairs of one a followed by one b. This means that each a must be followed by a b . Semantics The meaning of MATHS EBNF is defined in /u/faculty/dick/cs320/maths/grammar