We’re happy to announce the first tool in our own API series, the BookGlutton Epub Converter. It’s a simple way to create the IDPF‘s open ebook format, ePub, from basic HTML files. The REST-like interface allows developers to do conversions from anywhere on the Web, be it a backend script or a frontend form for their users. The curious can play with the tool on our site, where we’ve put up some documentation and a test form.
Now, I know I’ve voiced concerns about the ePub format before, so at first glance it wouldn’t make sense for me to build a tool which creates more of it. The short explanation is that if we make this format accessible to independent, open-source Web developers and tech-savvy Web readers and writers, a collective outcry may have more sway in future renditions of it. So please, create some ePubs with this. If you’re curious about the internal XML workings of the format, rename your epub with a .zip extension, unzip it, and open the files up in your favorite text editor. Then ask yourself how the format could be improved for you, and tell the IDPF what you think.
We’re committed to being open and we hope that developers in the online book community will not only want to use the tools we develop, but will also feel encouraged to develop their own. As always, we welcome suggestions. Developers out there: what services or data would you like to see us make accessible to your own sites? Users: what tools might make reading online easier and more fun? Let us know!
Hi Onar,
The converter will be back online in the next ten days. We’ve been making some changes to it – cleaning it up a bit. Stay tuned – we’ll announce it here and on our Twitter stream.
Is this .epub converter taken offline for good? Are there any other good or similar epub converters out there?
Mangal: Technically, all .epub readers _must_ allow DTBook to be used as the internal content. OPS 2.0 includes DTBook as a “preferred vocabulary” along with XHTML. When asked, the Adobe Digital Editions team assured me that Digital Editions handled DTBook-backed .epub files well.
could i create .epub from DTBook using this tool?
Is there are any reader exists which read DTBook based epub?
Added that in. Thanks!
Aaron: I’d suggest pretty-printing your XML for now so that you can get more meaningful line numbers from epubcheck. In my case, it complains here for the unfinished element:
11
The jing RelaxNG validator isn’t much more helpful (http://idpf.org/2007/opf/OPF_2.0_final_spec.html#AppendixA):
$ jing opf2.rng OPS/index.opf
/Users/keith/scratch/book_glutton/foo/OPS/index.opf:11:14: error: unfinished element
/Users/keith/scratch/book_glutton/foo/OPS/index.opf:2:91: error: IDREF “PrimaryID” without matching ID
…but does actually provide the answer. Please add the @id” value to the dc:identifier.
Yes, the problem was the PHP ZipArchive class which was compressing and not storing. I moved creation of the zip to the command line and now the validator no longer complains. It’s still complaining about unfinished elements though.
Harrison is correct. I investigated my epub-generation code and found the -X flag too. See http://docbook.svn.sourceforge.net/viewvc/docbook/trunk/xsl/epub/bin/lib/docbook.rb?view=markup
Maybe this is the mimetype problem and solution:
OCF section 4 — http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm — mandates specific byte offsets for the mimetype filename and its contents.
If you use the Info-ZIP command-line zip tool you must use the -X (eXclude eXtra file attributes) option (otherwise stuff is stored between the filename and contents — offending the second offset rule).
Keith,
Yes, epubcheck complains about mimetype being at the wrong byte offset. Any idea how they came up with the byte offset? Mimetype is the first file in the archive. You can confirm this with a hex editor.
I’d appreciate anyone explaining the unfinished element errors. I’ve seen these, and I’ve double-checked the spec and the output. What elements are unfinished?
As for the other errors, currently the API does not support external references in uploaded files. It will in the future, but image or link references will cause the epub to not validate. Nevertheless, it will display in Digital Editions. Try uploading an HTML file without external references, and stay tuned to the blog for when we announce support for zip archive conversion. 😉
Aaron
This looks like an interesting project, but I’d recommend working with http://code.google.com/p/epubcheck/ on some more documents:
# download your API test page
$ curl -s “http://www.bookglutton.com/api/getepub.html” > this_is_recursive.html
# Give it right back to the API page to turn into .epub
$ curl -sX POST –form file=@this_is_recursive.html http://www.bookglutton.com/api/getepub.html > this_is_recursive.epub
# Inspect the results
$ unzip -l this_is_recursive.epub
Archive: this_is_recursive.epub
Length Date Time Name
——– —- —- —-
20 05-08-08 19:01 mimetype
0 05-08-08 19:01 /META-INF/
0 05-08-08 19:01 OPS/images/
237 05-08-08 19:01 META-INF/container.xml
2304 05-08-08 19:01 OPS/index.opf
779 05-08-08 19:01 OPS/index.ncx
10059 05-08-08 19:01 OPS/this_is_recursive.xml
——– ——-
13399 7 files
# epubcheck the results
$ epubcheck this_is_recursive.epub
this_is_recursive.epub: mimetype contains wrong type (application/epub+zip expected)
this_is_recursive.epub/OPS/index.opf(3): unfinished element
this_is_recursive.epub/OPS/index.opf(4): duplicate resource: OPS//images/second-nav-ro-2.gif
… bunch of missing images …
this_is_recursive.epub: I/O error reading OPS/this_is_recursive.xml