TADIST file naming pattern

NOTE HXA7241 2014-12-31T11:01Z

How should you name user-content files? Here is a possible way: a simple formalised pattern that seems to work for various things.

Or: don't name your article ‘paper.pdf’. Name it like this ...

Examples

These give most of the idea:

The-Practice-Of-Programming_Kernighan-Pike_1999.ISBN-020161586X.djvu

Concerto-2-E-min-RV279-1-Allegro_Vivaldi_1713-1995.ISRC-GBFO77341004.v245.mp3

Missa-Papae-Marcelli-Kyrie_Palestrina-TheTallisScholars_1562-1999.ISRC-GBADM9400034.v245.mp3

Sparrow-On-The-Crabapple-Tree_Anderson_2007.flickr-734188511.1024x683.jpg

Description

‘TADIST’ is an acronym: the format is basically:

title _ author _ date . id . subtype . type

But some parts can be omitted, so then the schema is roughly (with [] meaning optional):

title [_ author [_ date]] [. id] [. subtype] . type

Furthermore, each part (except the type) can have sub-parts separated by ‘-’, so (with ... meaning possible repetition):

ttt-tt-... [_ aaa-aa-... [_ ddd-dd-...]] [. iii-iiii] [. sss ] . type

The semantics of sub-parts vary though:

  • title is a string of words (‘-’ means space)
  • author and date are arrays of items (‘-’ means ‘,’)
  • id is a label-value pair (‘-’ separates those two)

And more detail on some elements:

  • date is an ISO-8601 date in compact (no-space) form
  • id should follow some standard or convention, e.g.: ISBN-9780631128014
  • sub-type should be understandable from the type, e.g.: 100x60.png, c320.mp3
  • type should be from a commonly known vocabulary, e.g.: jpg, epub, mp3

These are all assembled only from three character classes which are easy to handle in file names:

  • letters: (at least ASCII)
  • digits: 0123456789
  • separators: - _ .

Grammar

To be more exact, here is the grammar in SBNF:

(filename (, plain meta))

(plain   (, title (? (, "_" author (? (, "_" date))))))
(title   (+ alphnum "-"))
(author  (+ alphnum "-"))
(date    (+ (, year (? (, month (? day)))) "-"))

(meta    (, (? (, "." id)) (? (, "." subtype)) "." type))
(id      (, alphnum "-" alphnum))
(subtype alphnum)
(type    alphnum)

(year    (, (? "-")
            (| (=4 digit)
               (, (=3 digit) "X")
               (, (=2 digit) "XX")
               (, digit "XXX"))))
(month   (| "01" "02" "03" "04" "05" "06"
            "07" "08" "09" "10" "11" "12"))
(day     (| (, "0" (| "1" "2" "3" "4" "5" "6" "7" "8" "9"))
            (, (| "1" "2") digit)
            (, "3" (| "0" "1"))))
(alphnum (+ (| letter digit)))

(digit   (| "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))

Assuming:

  • letter – at least ASCII

Comments:

  • title – a string of words
  • author – an array of names
  • date – an IS0-8601 date in compact (no-space) form
  • id – following a label-value convention, e.g.: ISBN-9780631128014
  • subtype – understandable from the type, e.g.: 100x60.png, c320.mp3
  • type – from a commonly known vocabulary, e.g.: jpg, epub, mp3
  • year – does use leading “-” for BCE, and has a non-standard augmentation: “X” for unknown

Data structure

The data-structure would be something like this (in quasi-code, where : means ‘of type’):

filename : product of {
   title   : array of { string } ,
   author  : array of { string } ,
   date    : array of { ISO8601-date } ,
   id      : option of {
                product of {
                   label : string ,
                   value : string
                   }
                } ,
   subtype : option of string ,
   type    : string
   }

where strings are alpha-numeric only and non-zero length, and the title array has at least one element.

Plain-text form

The name pattern also has a plain-text form, so e.g.:

"The Practice Of Programming" ; Kernighan, Pike ; 1999 / ISBN-020161586X / djvu .

is the text representation of:

The-Practice-Of-Programming_Kernighan-Pike_1999.ISBN-020161586X.djvu

The grammar in SBNF:

(textform (, plain meta " ."))

(plain   (, title (? (, " ; " author (? (, " ; " date))))))
(title   (, "\"" chars-no-dquos "\""))
(author  (+ chars-no-commas-or-semicolons-or-slashes ", "))
(date    (+ (, year (? (, "-" month (? "-" day)))) ", "))

(meta    (, (? (, " / " id)) (? (, " / " subtype)) " / " type))
(id      (, alphnum "-" alphnum))
(subtype alphnum)
(type    alphnum)

(year    (, (? "-")
            (| (=4 digit)
               (, (=3 digit) "X")
               (, (=2 digit) "XX")
               (, digit "XXX"))))
(month   (| "01" "02" "03" "04" "05" "06"
            "07" "08" "09" "10" "11" "12"))
(day     (| (, "0" (| "1" "2" "3" "4" "5" "6" "7" "8" "9"))
            (, (| "1" "2") digit)
            (, "3" (| "0" "1"))))
(alphnum (+ (| letter digit)))

(digit   (| "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))

Assuming:

  • chars-no-dquos
  • chars-no-commas-or-semicolons-or-slashes
  • letter – at least ASCII

Comments:

  • title – a string of words
  • author – an array of names
  • date – an IS0-8601 date
  • id – following a label-value convention, e.g.: ISBN-9780631128014
  • subtype – understandable from the type, e.g.: 100x60 / png, c320 / mp3
  • type – from a commonly known vocabulary, e.g.: jpg, epub, mp3
  • year – does use leading “-” for BCE, and has a non-standard augmentation: “X” for unknown