Open Augmentation Files

Harrison Ainsworth
artifex (ατ) hxa7241 (dοτ) org



A very simple pattern for sharing marked-up/parsed augmentations of files. Multiple non-interfering annotations are stored in standard a format. (400 words)

software, pattern, markup, parse, format, XML
Creative Commons BY-SA 3.0 License.



Storing source-code in a more sophisticated form, such as XML, appears to offer interesting advantages. A semi-compiled/parsed representation would enable richer views or transforms.

It is not a matter of adding new information, it is really externalising compiler intelligence. Data is moved from compiler to source. This allows other tools to use it, and helps them do more, more easily.

This can be generalised beyond source-code. Any file that is processed in some way could have output moved back out to the original file. Essentially, augmenting a file like this means: the benefit of any processing can be shared. That sounds good.

Sharing presupposes a commonality of language. Any analysis or processing is built from a limited set of common data structures. With common expressions of those, sharing is possible. XML is a lingua-franca covering these kinds of things. Several variants and related alternatives are also available.


The augmentation is put outside the original in a file named ‘originalfilename.oa’. It is found implicitly, and is merely associated with the original. The contents are XML(/etc.) of the form:

  • open-augmentation-file
    • augmentation (zero or more)
      • maker
        unique composite ID for this augmentation
      • parent-hash
        unique ID of the parent instance
      • body
        specialised data structure

The maker can be a URI. The parent-hash can be an MD5. The body is defined by the particular augmentor.


Separating the augmentation from the original is valuable. The original can be used exactly as before. Multiple augmentations can be applied simultaneously, and new ones developed without disturbing anything else.

Any augmentor parsing or transforming the original file is free to add its own augmentation section(s). Any other tool can recognize the augmentation it wants from the maker.

The augmentation can reference the original's content in any way: from byte-level upward. Validity can be guaranteed by the parent-hash.

The overheads are minimal: serializing an extra file, and computing a hash for a file.