Links, pointers, bookmarks, highlights: How should .epub do it?

Editor’s Note: for those readers who can’t suffer jargon, please skip this post. For those who thrive on it, read ahead. We welcome your feedback.

Uncertain DRM is one of the most-cited flaws of the .epub specifications, but it seems like more of a business problem than a technical one. The real technical challenge with using the proposed standards comes from a lack of direction on how exactly we’re supposed to link between books, within books, or within fragments of books. We’re calling this Deep-Linking.

What is clear is that we can use IRIs (Internationalized Resource Identifiers) to link to an OPS structure, specify a path into it and identify a fragment with an ID. That’s great, but I wouldn’t consider it Deep-Linking. The W3C DOM gives us Range specification, Node collections, XPath, CSS selectors and Xpointer. Sounds like plenty of tools. So why not specify one or two in the recommendations?

Relying on unique IDs to point into documents may be the best way to create bookmarks that can reliably return us to the beginning of a chapter, or, say, the third paragraph of that chapter. They work fine if each annotation is associated with an entire paragraph or element, nothing more, nothing less. But if we want a user to be able to specify a sentence, a passage or a word, or a collection of paragraphs, we need something better. And what about specifying a collection of non-sequential paragraphs, such as the first paragraph of each chapter, or the first sentence of each of those paragraphs?

There is no firm imperative or recommendation on one method or another. No URI scheme is mentioned either. A URI scheme would allow deep linking via an epub:// link, a link which actually points to a location inside of a zipped file heirarchy.

The OpenReader Binder spec addresses this issue, the OEBPS does not (thanks, Jon!). OpenReader recommends using IRIs too, and doesn’t end up getting us too much deeper link-wise, but at least it takes a stance.

A URI scheme raises another question: would it be better practice to link through the NCX, or directly into the file structure? Direct URLs are certainly more open and Webby. But linking through the NCX allows a certain measure of content control (more appealing to the DRM crowd, and probably also to e-book authors). For example, the NCX or, more likely, the OPF file could specify rights and/or codec information, but only for the parts to be made openly accessible. Eg:

epub://bookishstore.com/austen/sense+sensibility/UUID/chapter1#paragraph2

epub://bookishstore.com/austen/sense+sensibility/UUID/chapter1#xpointer(/p[2])

The thought has obviously occurred to Microsoft, who’s either trying to patent it, or just protecting alternatives to Sun’s Java API. The jar:// scheme that Sun came up with for linking into zip files is used by Firefox quite extensively, though it’s not an IETF standard. A jar-style URL for epubs would look like this (substituting ‘epub’ for ‘jar’ in the scheme identifier):

epub://bookishstore.com/austen/sense+sensibility/friendlyname.epub!/chapter1.xml#paragraph2

There is also a scheme for including data directly in a browser. The latest crop of browsers (IE8, Safari 3, Opera, Mozilla) all support it. The problem with it would be the length of the URIs needed to load an entire book. However, a variant of the epub spec could allow data:// URIs in the content.opf, which would allow it to be loaded and parsed by a browser, images and all.

For example, we’d all like to link to parts of a package with something like this:

http://bookishstore.com/austen/sense+sensibility/978-0-9767736-6-5/chapter1.xhtml#paragraph2

Perhaps there’s room for several methods of linking. After all, if we’re going to have both .epubs and OPS structures out there, we’ll need alternatives. We’re leaning toward the open, http-based options. Feel free to disagree. Dialog helps.

Links, pointers, bookmarks, highlights: How should .epub do it?

Share this:

Related

0 Comments Comments are closed.