Epub Format Construction Guide

Harrison Ainsworth

http://www.hxa.name/
hxa7241+articles (ατ) googlemail (dοτ) com

2010-08-27

Summary

A guide for making Epub ebooks/publications, sufficient for most purposes. It requires understanding of XHTML, CSS, XML. (1900 words)

Download an example publication – this document as Epub: http://www.hxa.name/articles/content/EpubGuide-hxa7241.epub

Contents

Introduction

This is a guide for making IDPF Epub ebooks/publications. It is mostly an annotated example: this document itself in Epub form.

Not all details/variations are mentioned, but enough to obviate need of the specifications for normal use. And it is for making entirely conformant publications.

Included also is a description of optional extra styling for a particular reader (but still completely conformant).

You need an understanding of and ability to make XHTML/CSS and XML documents.

IDPF

‘Epub’ is a standard from the International Digital Publishing Forum. It is an arrangement of several other standards (mainly: XHTML, CSS, XML, NCX, DCMI). There are three parts, addressing: content, package metadata, and archive (OPS, OPF, and OCF). It is powerful, straightforward, and non-proprietary.

Adobe Digital Editions

ADE’ is one of the first readers for Epub publications. It is very conformant with the standard. It can use an optional proprietary publication component: an extra stylesheet to adjust text-column appearance. (That is allowed by the standard.)

This guide was written using ADE version 1.0.467 .

1: XHTML Documents

Make the main content with XHTML, CSS, and images.

Relevant specifications: OPS, XHTML, CSS.

XHTML

Use XHTML 1.1, but without the following modules:

  • Forms
  • Server-side Image Map
  • Intrinsic Events
  • Scripting

(XHTML 1.1 difference from XHTML 1 strict:

  • lang attribute not allowed (use xml:lang instead)
  • name attribute on a and map elements not allowed (use id instead)
  • ruby annotations are allowed)

Include XML declaration and XHTML doctype, at the top:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

and xmlns attribute in html:

<html xmlns="http://www.w3.org/1999/xhtml">

Any unicode character, in UTF-8 or UTF-16, is allowed. But readers may have limited rendering capabilities.

(ADE 1.0 doesn't support: &shy; &ensp; &emsp; &thinsp; &zwnj; &zwj; &lrm; &rlm; &oline; &lceil; &rceil; &lfloor; &rfloor;)

CSS

A subset of CSS 2.1 is supported. A brief summary is awkward to make. For details, see the CSS part of the OPS specification.

Be simple, and use CSS 1 without the following properties:

  • background image related:
    • background-image
    • background-repeat
    • background-attachment
    • background-position
    • background
  • word-spacing
  • letter-spacing
  • text-transform
  • list-style-image

(There are also a few other minor details unsupported.) And don't use absolute positioning.

The CSS can be linked from the XHTML head, or put in style in head.

(ADE 1.0 doesn't support:

  • pseudo-classes/elements
  • text-align: justify;
  • font-variant: small-caps;
  • OPS extras:
    • display: oeb-page-head;
    • display: oeb-page-foot;
    • oeb-column-number: [integer];)

Images

The XHTML can have images of the following types:

  • image/jpeg
  • image/png
  • image/gif
  • image/svg+xml

Fonts

Use OpenType fonts. Reference them in the CSS with @font-face, eg.:

@font-face { font-family: "Minion Pro"; src: url(MinionPro.otf); }
@font-face { font-family: "Minion Pro"; font-style: italic;
   src: url(MinionPro-It.otf); }

Other descriptors allowed are: font-variant, font-weight, font-size.

2: Package And Container Files

Make these four files, according to the following descriptions:

mimetype

application/epub+zip

It is ASCII, with no trailing end-of-line.

Specification: OCF

container.xml

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
      <rootfile full-path="content.opf"
      media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

If you rename or put the content.opf file elsewhere than in this guide, change the full-path attribute to match.

Specification: OCF

content.opf

<?xml version="1.0"?>

<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="dcidid" 
   version="2.0">

   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:opf="http://www.idpf.org/2007/opf">
      <dc:title>Epub Format Construction Guide</dc:title>
      <dc:language xsi:type="dcterms:RFC3066">en</dc:language>
      <dc:identifier id="dcidid" opf:scheme="URI">
         http://www.hxa7241.org/articles/content/epup-guide_hxa7241_2007_2.epub
         </dc:identifier>
      <dc:subject>Non-fiction, technical article, tutorial, Epub, IDPF, ebook
         </dc:subject>
      <dc:description>A guide for making Epub ebooks/publications, sufficient
         for most purposes. It requires understanding of XHTML, CSS, XML.
         </dc:description>
      <dc:relation>http://www.hxa.name/</dc:relation>
      <dc:creator>Harrison Ainsworth / HXA7241</dc:creator>
      <dc:publisher>Harrison Ainsworth / HXA7241</dc:publisher>
      <dc:date xsi:type="dcterms:W3CDTF">2007-12-28</dc:date>
      <dc:date xsi:type="dcterms:W3CDTF">2010-08-27</dc:date>
      <dc:rights>Creative Commons BY-SA 3.0 License.</dc:rights>
   </metadata>

   <manifest>
      <item id="ncx"      href="toc.ncx"                 
         media-type="application/x-dtbncx+xml" />
      <item id="css"      href="EpubGuide.css"           
         media-type="text/css" />
      <item id="logo"     href="hxa7241-logo.svg"         
         media-type="image/svg+xml" />
      <item id="title"    href="EpubGuide-title.html"    
         media-type="application/xhtml+xml" />
      <item id="contents" href="EpubGuide-contents.html" 
         media-type="application/xhtml+xml" />
      <item id="intro"    href="EpubGuide-intro.html"    
         media-type="application/xhtml+xml" />
      <item id="part1"    href="EpubGuide-1.html"        
         media-type="application/xhtml+xml" />
      <item id="part2"    href="EpubGuide-2.html"        
         media-type="application/xhtml+xml" />
      <item id="part3"    href="EpubGuide-3.html"        
         media-type="application/xhtml+xml" />
      <item id="part4"    href="EpubGuide-4.html"        
         media-type="application/xhtml+xml" />
      <item id="specs"    href="EpubGuide-specs.html"    
         media-type="application/xhtml+xml" />
   </manifest>

   <spine toc="ncx">
      <itemref idref="title" />
      <itemref idref="contents" />
      <itemref idref="intro" />
      <itemref idref="part1" />
      <itemref idref="part2" />
      <itemref idref="part3" />
      <itemref idref="part4" />
      <itemref idref="specs" />
   </spine>

   <guide>
      <reference type="title-page" title="Title Page"        
         href="EpubGuide-title.html" />
      <reference type="toc"        title="Table of Contents" 
         href="EpubGuide-contents.html" />
      <reference type="text"       title="Text"              
         href="EpubGuide-intro.html" />
   </guide>

</package>

metadata (publication information)

Add publication information according to DCMI terms. Order is not significant, and duplicates are allowed.

Required terms:

  • title
  • language — use a RFC3066 language code
  • identifier — use a probably unique string: URI or ISBN would be good

Optional terms:

  • creator
  • contributor
  • publisher
  • subject
  • description
  • date
  • type
  • format
  • source
  • relation
  • coverage
  • rights

Some terms have optional attributes:

  • creator, contributor
  • date
    • opf:event — unstandardised: use something reasonable
  • identifier
    • opf:scheme — unstandardised: use something reasonable
  • date, format, identifier, language, type
    • xsi:type — use an appropriate standard term (such as W3CDTF for date)
  • contributor, coverage, creator, description, publisher, relation, rights, source, subject, title
    • xml:lang — use RFC-3066 format

manifest (document file list)

List every file that is part of the publication. But not: mimetype, container.xml, content.opf . The order is not significant.

Give correct mime-type in media-type attribute. ids are required and must be unique in the content.opf file.

spine (reading order definition)

List all XHTML documents in manifest (using the idref), and not anything else, and with no duplicates. The order is significant. (XHTML documents can be omitted, but then they must not be linked, referenced or reachable from any part of the publication.)

guide (main parts of document)

This section is optional.

Each item references a document file, and can have a fragment id. Allowed types are:

  • cover
  • title-page
  • toc (table of contents)
  • index
  • glossary
  • acknowledgements
  • bibliography
  • colophon
  • copyright-page
  • dedication
  • epigraph
  • foreword
  • loi (list of illustrations)
  • lot (list of tables)
  • notes
  • preface
  • text
  • other.[...]

Specifications: OPF, DCMI

toc.ncx

<?xml version="1.0"?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" 
   "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">

   <head>
      <meta name="dtb:uid" content="http://www.hxa7241.org/articles/content/epup-guide_hxa7241_2007_2.epub"/>
      <meta name="dtb:depth" content="2"/>
      <meta name="dtb:totalPageCount" content="0"/>
      <meta name="dtb:maxPageNumber" content="0"/>
   </head>

   <docTitle>
      <text>Epub Format Construction Guide</text>
   </docTitle>

   <navMap>
      <navPoint id="navPoint-1" playOrder="1">
         <navLabel>
            <text>Title Page</text>
         </navLabel>
         <content src="EpubGuide-title.html"/>
      </navPoint>
      <navPoint id="navPoint-2" playOrder="2">
         <navLabel>
            <text>Table of Contents</text>
         </navLabel>
         <content src="EpubGuide-contents.html"/>
      </navPoint>
      <navPoint id="navPoint-3" playOrder="3">
         <navLabel>
            <text>Introduction</text>
         </navLabel>
         <content src="EpubGuide-intro.html"/>
      </navPoint>
      <navPoint id="navPoint-4" playOrder="4">
         <navLabel>
            <text>1: XHTML Documents</text>
         </navLabel>
         <content src="EpubGuide-1.html"/>
      </navPoint>
      <navPoint id="navPoint-5" playOrder="5">
         <navLabel>
            <text>2: Package And Container Files</text>
         </navLabel>
         <content src="EpubGuide-2.html"/>
         <navPoint id="navPoint-6" playOrder="6">
            <navLabel>
               <text>mimetype</text>
            </navLabel>
            <content src="EpubGuide-2.html#mimetype"/>
         </navPoint>
         <navPoint id="navPoint-7" playOrder="7">
            <navLabel>
               <text>container.xml</text>
            </navLabel>
            <content src="EpubGuide-2.html#containerxml"/>
         </navPoint>
         <navPoint id="navPoint-8" playOrder="8">
            <navLabel>
               <text>content.opf</text>
            </navLabel>
            <content src="EpubGuide-2.html#contentopf"/>
         </navPoint>
         <navPoint id="navPoint-9" playOrder="9">
            <navLabel>
               <text>toc.ncx</text>
            </navLabel>
            <content src="EpubGuide-2.html#tocncx"/>
         </navPoint>
      </navPoint>
      <navPoint id="navPoint-10" playOrder="10">
         <navLabel>
            <text>3: ADE stylesheet</text>
         </navLabel>
         <content src="EpubGuide-3.html"/>
      </navPoint>
      <navPoint id="navPoint-11" playOrder="11">
         <navLabel>
            <text>4: Container Structure</text>
         </navLabel>
         <content src="EpubGuide-4.html"/>
      </navPoint>
      <navPoint id="navPoint-12" playOrder="12">
         <navLabel>
            <text>Specifications List</text>
         </navLabel>
         <content src="EpubGuide-specs.html"/>
      </navPoint>
   </navMap>

</ncx>

head

Set the following meta content attributes:

  • uid — to the unique identifier in content.opf
  • depth — to the depth of the contents tree (in navMap), integer, >= 1
  • totalPageCount — to 0
  • maxPageNumber — to 0

navMap

Make a table of contents, optionally hierarchical. (navMap doesn't need to include all XHTML files, since the content.opf spine does.)

navPoint

Set both attributes:

  • id — to be unique in file
  • playOrder — to an integer, ordered in navMap, starting at 1

Set sub-parts:

  • the content of text in navLabel
  • the src attribute in content — to a URI of one of the XHTML files (fragment id allowed)

navPoints nested in navPoints are allowed.

(The Sony Reader, and perhaps others, have an extra restriction: fragment ids (in src attributes of contents.) are not allowed in top-level (non-nested) navPoints.)

Specification: NCX

3: ADE stylesheet

Optionally, make this file if you want extra control of column appearance with ADE 1.0:

  • page-template.xpgt

Add a link in the head of XHTML files to be styled:

<link rel="stylesheet" type="application/vnd.adobe-page-template+xml" 
   href="page-template.xpgt"/>

Will the publication then be non-conformant? Non-standard files can be included (like fonts), but must have proper fallback handling. The standard implies that all fallback behaviour is explicitly standardised (in IDPF or component standards). For stylesheets, HTML rules say readers should ignore unrecognized types. And that would very likely happen. So it seems conformant, and safe.

page-template.xpgt

<ade:template xmlns="http://www.w3.org/1999/xhtml" 
   xmlns:ade="http://ns.adobe.com/2006/ade" 
   xmlns:fo="http://www.w3.org/1999/XSL/Format">

   <fo:layout-master-set>
      <fo:simple-page-master master-name="single_column" margin-bottom="2em" 
         margin-top="2em" margin-left="2em" margin-right="2em">
         <fo:region-body/>
      </fo:simple-page-master>

      <fo:simple-page-master master-name="single_column_head" margin-bottom="2em" 
         margin-top="2em" margin-left="2em" margin-right="2em">
         <fo:region-before extent="8em"/>
         <fo:region-body margin-top="8em"/>
      </fo:simple-page-master>

      <fo:simple-page-master master-name="two_column" margin-bottom="2em" 
         margin-top="2em" margin-left="2em" margin-right="2em">
         <fo:region-body column-count="2" column-gap="3em"/>
      </fo:simple-page-master>

      <fo:simple-page-master master-name="two_column_head" margin-bottom="2em" 
         margin-top="2em" margin-left="2em" margin-right="2em">
         <fo:region-before extent="8em"/>
         <fo:region-body column-count="2" margin-top="8em" column-gap="3em"/>
      </fo:simple-page-master>

      <fo:simple-page-master master-name="three_column" margin-bottom="2em" 
         margin-top="2em" margin-left="2em" margin-right="2em">
         <fo:region-body column-count="3" column-gap="3em"/>
      </fo:simple-page-master>

      <fo:simple-page-master master-name="three_column_head" margin-bottom="2em" 
         margin-top="2em" margin-left="2em" margin-right="2em">
         <fo:region-before extent="8em"/>
         <fo:region-body column-count="3" margin-top="8em" column-gap="3em"/>
      </fo:simple-page-master>

      <fo:page-sequence-master>
         <fo:repeatable-page-master-alternatives>
            <fo:conditional-page-master-reference 
               master-reference="three_column_head" page-position="first" 
               ade:min-page-width="80em"/>
            <fo:conditional-page-master-reference 
               master-reference="three_column" ade:min-page-width="80em"/>
            <fo:conditional-page-master-reference 
               master-reference="two_column_head" page-position="first" 
               ade:min-page-width="50em"/>
            <fo:conditional-page-master-reference 
               master-reference="two_column" ade:min-page-width="50em"/>
            <fo:conditional-page-master-reference 
               master-reference="single_column_head" page-position="first"/>
            <fo:conditional-page-master-reference 
               master-reference="single_column"/>
         </fo:repeatable-page-master-alternatives>
      </fo:page-sequence-master>
   </fo:layout-master-set>

   <ade:style>
      <ade:styling-rule selector="#header" display="adobe-other-region" 
         adobe-region="xsl-region-before"/>
   </ade:style>

</ade:template>

The selector attribute in ade:style/ade:styling-rule refers to a CSS selector. There is more detail at: http://blogs.adobe.com/digitaleditions/template.html

Specification: unknown

4: Container Structure

Arrange all files in the following directory structure:

EpubGuide
   META-INF
      container.xml
   mimetype
   content.opf
   toc.ncx
   EpubGuide.css
   hxa7241-logo.svg
   EpubGuide-title.html
   EpubGuide-contents.html
   EpubGuide-intro.html
   EpubGuide-1.html
   EpubGuide-2.html
   EpubGuide-3.html
   EpubGuide-4.html
   EpubGuide-specs.html

(META-INF and its contents are special, but all other files can be arranged into any subdirectory structure. All references to them, in the various files, may have to be adjusted though.)

Then zip them into an archive with Zip. The filename extension should be ‘epub’, and the mimetype file must be first (and uncompressed), and extra file attributes must be excluded:

zip -X0 EpubGuide-hxa7241.epub mimetype
zip -Xur9D EpubGuide-hxa7241.epub *

(Get Zip from: ftp://ftp.info-zip.org/pub/infozip/ or http://www.info-zip.org/Zip.html .)

Other zip programs can probably be used, if they can do the same things.

(The Sony Reader, and perhaps others, have an extra requirement: each HTML file must be < 300KB and < 100KB when zipped.)

Specification: OCF

Specifications List

IDPF
http://www.idpf.org/specs.htm
Open Publication Structure (OPS) 2.0 v1.0
http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html
Open Packaging Format (OPF) 2.0 v1.0
http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html
OEBPS Container Format (OCF) v1.0
http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm
ANSI/NISO Z39.86 - 2005 Specifications for the Digital Talking Book, NCX part (NCX)
http://www.niso.org/standards/resources/Z39-86-2005.html#NCX
DCMI Metadata Terms 2006-12-18 (DC)
http://dublincore.org/documents/2006/12/18/dcmi-terms/
XHTML 1.1
http://www.w3.org/TR/xhtml11/
CSS 2.1
http://www.w3.org/TR/CSS21/
XML 1.0
http://www.w3.org/TR/xml/

Metadata

(TXON)

DC:`
   title:`Epub Format Construction Guide`
   creator:`Harrison Ainsworth`

   date:`2007-12-28`
   date:`2010-08-27`

   description:`A guide for making Epub ebooks/publications, sufficient for most purposes. It requires understanding of XHTML, CSS, XML.`
   subject:`Epub, IDPF, ebook`

   language:`en-GB`
   type:`technical article`
   relation:`http://www.hxa.name/`
   identifier:`http://www.hxa.name/articles/content/epub-guide_hxa7241_2007.html`
   rights:`Creative Commons BY-SA 3.0 License`
`