SBNF –
BNF in S-Exprs

NOTE HXA7241 2013-01-13T11:07Z

S-Expressions seem particularly well-suited to BNF formulations, giving a simple, regular, and memorable format.

Problem: BNF, or EBNF, or ABNF, or some one-off variant? are sequences comma separated or just spaced? is choice higher or lower precedence? is it grouped with parenthesis, or brackets? or do brackets represent option, or is it braces? . . .

With S-Exprs you know where you are: a parenthised space-separated list (-tree) with the meaning in the first item. You can guess the symbols for choice, option, repetition. Easy. Memorable. S-Exprs are a meta (lexical-) syntax that the idea of BNF fits neatly into.

Example

(conditional
   (| if unless))
(if
   (, "if " expr " then" block
      (* (, "elseif " expr " then" block))
      (? (, "else" block))
      "end"))
(unless
   (, "unless " expr " then" block "end"))

Description

  • (name ...)definition of a rule as a thing
  • (, ... ... ...)sequence of some things
  • (| ... ... ...)choice of some things
  • (? ...)optional inclusion (zero or one) of a thing
  • (* ...)option-repetition (zero or more) of a thing
  • (+ ...)repetition (one or more) of a thing
  • (=... ...)countation (some number/range) of a thing
  • "..."string constant (double-quote-escaped)
  • namereference to a rule (names are: letters, digits, and hyphens)

Things (except strings and references) contain other things (except definitions).

Option-repetition, repetition, and countation also optionally contain a second thing to use as a separator.

Various obvious or straightforward rules could be assumed, e.g.: digit, letter, blank, int, real.

(And of course, as for any text, this should all be in UTF-8.)

Grammar

Defined in itself:

(SBNF
   (grammar      (, "(" name (+ blank) (* rule (+ blank)) (* blank)
                    ")"))
   (rule         (, "(" name content (* blank) ")"))
   (content      (, (+ blank) (| construction constant name)))
   (construction (, "(" (| group option repeat count) (* blank)
                    ")"))
   (group        (, (| "," "|") (* content)))
   (option       (, "?" content))
   (repeat       (, (| "*" "+") content (? content)))
   (count        (, "=" (+ digit) (? (, "-" (+ digit))) content
                    (? content)))
   (constant     (, "\"" (* charescaped) "\""))

   (name         (, letter (* (| letter digit "-"))))
   (charescaped  (| char-non-doublequote-non-backslash
                    "\\\"" "\\\\")))

Assuming:

  • digit
  • letter
  • blank
  • char-non-doublequote-non-backslash

Appendix

Constants/strings are a bit of a special object: not a proper form or atom. They are neater for the user, but less so for the implementor. Alternatively they could be turned into quote forms (' ...) and have their contents escaped for blanks and parentheses instead of double-quotes. Then it would all be forms and atoms.

And maybe definitions/rules should be nestable somehow . . .