A conversational convention for the web

NOTE-HXA7241-20100811T2247Z

Harrison Ainsworth

A simple ad-hoc ‘semanticised web’ idea for making structures of related texts

References among web documents are made with links, but search engines don't reliably/effectively allow query of these. So even though there is some semantic relation between separate writings, it is hard to get.

A simple text convention could remedy this, and add graspable structure to web ‘conversations’. Links can be put directly into the text in a particular format:

A URL with a special 2-part pre-scheme containing a general marker and specific relation, e.g.:

wbcvsn-reply:http://example.org/something.html

‘wbcvsn’ would probably do as a general marker – it is possibly unique, and conjunctions of it and another term are very likely distinctive. The relations are open-ended: finding a good set seems more appropriately a task for evolution, not specification. But some possibles could be: ‘reply’, ‘update’, ‘question’, ‘answer’, ‘annotation’, ‘correction’, ‘addition’, . . .

Quasi-code like a URL is a bit ungainly, but it is readable and familiar, and could find a place like a kind of footnote.

Merely writing such links into text would produce benefit. It enables search engine queries to show the structure. But with a little software to automate and aggregate such querying a richer view of the relation-graph could be quite easily presented.

Appendix

Maps directly to RDF

An expression like:

wbcvsn-reply:http://example.org/something.html

is an abbreviated triple:

The subject is implicit: it is the paragraph, section, or document itself.
The predicate is reduced to the relation term of the pre-scheme.
The object is the normal URI part (after the pre-scheme).

So, with a simple translation, it could be fed into any RDF processing.

Improves on tagging

A clear, abstract, propositional structure gives it some substantial meaning.

‘Tagging’ is too loose. It has too little convention. No-one seems to know how to use tags. Most simply repeat terms already in the content.

If you don't have to think whether you are making a good or correct assertion, such an assertion has practically no meaning. To be informative requires saying something that could be wrong. And to do this in an interesting way requires statements with sufficiently rich structure. The rules don't need to be strict and total, but they need some structure, and some sense of getting it right or not.

Avoids the deception problem

It is a special kind of metadata: integrated and readable with the content it annotates. And this counters deception.

Metadata makes assertions that can be true or false. So it can be exploited to deceive, in systematic, automated ways – which undermines trust in any of it. But metadata is still just data. Whatever means we have to evaluate general data can apply to metadata too. If the metadata is presented explicitly with the rest of the text of a web-page, then it can be judged as part of it, and that general opinion summarised in PageRank or similar. Good metadata will be ranked above bad just as good data is – the text as a whole is.