I.T. Consulting
Tutorials
Sysadmin
How RAID Works What's a RAID? Hardware vs Software RAID Striping and Mirroring RAID 2 and 3 RAID 4 and 5 Conclusion
Software RAID on Linux RAID: Quick Recap Software Tools Creating & Using an Array Monitoring an Array Removing & Re-Assembling an Array The mdadm.conf File Deleting an Array Summary & Cheat-Sheet
Network Security
Squid Proxy Server Basic Configuration Controlling Traffic Blocking Access Monitoring Traffic
SSH: Secure Shell Overview Using SSH Encryption Authentication Keys Configuring SSH Advanced Tricks
Implementing HTTPS What Is HTTPS? Setting Up The Server
Linux Skills
The ed Line Editor First Things First Navigating Entering Text Changing Text Line Maneuvers Text Searches Using ed in Real Life Summary
Regular Expressions Text Patterns Extended Expressions
The vi Editor Introduction Operating Modes Navigation Editing Summary
Intermediate vi Power Editing Cut-and-Paste Modifying Text Searches Tips & Tricks The vi Prompt Indenting
Miscellaneous
Creating an eBook Introduction Create an ePub Create a MOBI Create a PDF

How to Create Your Own eBooks

If you're planning to create your own ebooks, this article will show you exactly how to do it.

Different ebook formats exist to support different makes of eReaders like the Kindle, the Sony and the Kobo. We will examine which three formats are mainstream and how to publish your own titles in each of them.

Why ebooks?

eBook reading device

Ebooks are exploding in popularity and for good reason. They are cheaper to buy, cheaper to produce, have a near-zero impact on the environment, and can be produced by anyone with a little technical know-how. And of course, the convenience is amazing. You can purchase and get your hands on a book at any time of day or night, perhaps on a whim, without having to get out of the house. You can then read it on your computer, eReader, PDA or smart phone, and carry hundreds of books around in your pocket, briefcase or purse.

Much like YouTube has made it possible for countless garage bands to create and distribute their own music videos at virtually no cost, ebooks are making it possible for authors everywhere to publish their works without having to beg a publisher or an agent to take them on.

Let's Create an eBook

In this article, I will guide you through the steps to create a sample ebook in the three most popular formats: ePub, MOBI and PDF.

In order to keep our sample configurations short and easy to understand, our sample ebook will be the absolute simplest, most basic ebook one can create in this format. I will leave it up to you, the reader, to dig deeper and add more flesh to the bones after you have generated this first ebook.

One word of caution: while the first part of this article provides general information about ebooks and ebook formats, the remaining parts are targeted at a technical audience who is at ease with invoking Linux utilities from the command line and reasonably familiar with building simple Web pages using HTML and CSS.

eBook Formats

Of course, since ebooks are an emerging technology, there are currently a few different and incompatible formats for different reading devices. If you are currently thinking of purchasing an eReader, or if you are thinking of creating your own ebooks for sale, it's important to be aware of these differences.

At the moment, the major brands of eReaders are created and sold by book sellers. For instance, Amazon.com sells the Kindle, Barnes & Noble sells the Nook, Indigo in Canada sells the Kobo, and Sony sells the Sony Reader.

In every case, these vendors expect you to buy books from their own store and will provide software to let you read these books on your PC or on their own brand of eReader. For instance, if you have purchased a Sony Reader, the software that comes bundled with the device will let you create an account on the Sony Web-store and you will be expected to purchase all your books from this single source.

While it is technically possible for a user to download a book from the Kobo store, for instance, and upload it to their Sony Reader, it does require some technical know-how that most consumers don't possess.

Amazon.com makes cross-purchases even more difficult by using its own proprietary ebook format that is incompabible with other eReaders. Specifically, if you purchase an ebook on Amazon.com, it will only work in a Kindle or on your PC using the Kindle software.

ePub: The Emerging Standard

ePub logo

Fortunately, a standard is quickly emerging across all reading devices: the ePub format. Epub stands for "Electronic Publication." As of this writing (April 2011), virtually all eReaders support ePub except for the Kindle (sold by Amazon.com). Unfortunately, the Kindle currently has the largest share of the market, so it cannot be ignored.

The Kindle supports the Mobipocket (a.k.a. "MOBI") format, which is an older but unrestricted ebook format, and the AZW format, which is essentially a DRM-restricted (i.e. encrypted and copy-protected) version of the MOBI format. We will discuss DRM in the next section.

If you are considering creating your own ebooks, MOBI and ePub are the only two formats you should really be concerned about for the eReader target. For a target audience who will be reading your books on a full-size screen, PDF ("Portable Document Format") is also a viable option. We will discuss PDF in greater detail shortly.

DRM, or Copy Protection

To complicate things further, most ebook vendors encrypt and copy-protect their files using a technology called Digital Rights Management, or DRM. A DRM-protected ebook can only be read on a device that has been "authorized" through a software process. For instance, if you download a book for your Kobo eReader from kobobooks.com, you will also need to have installed a program called Adobe Digital Editions on your PC and you will need to use this software to read the ebook on your PC or to authorize your Kobo eReader to receive it. In theory, if you were to send a copy of that ebook to a friend, the friend would not be able to read it on his/her PC or eReader. In practice, there are tools available on the Internet to remove DRM restrictions from an ebook, so these restrictions are rather futile.

The whole argument around DRM is a point of philosophy that can be argued forever. On one hand, it is fair to allow the copyright holder to make sure that only legitimate purchasers of an ebook can legally enjoy it; on the other hand, DRM forces users to go through inconvenient processes to download and authorize the products they are purchasing, which is very annoying.

It is up to each publisher to decide whether their works will be DRM-protected or not. Incidentally, the books you will find on my site are all DRM-free, which means you can simply purchase and download them, transfer them to your eReader, and start reading immediately, without having to jump through technical hoops. I trust that you will not abuse this policy by illegally distributing my copyrighted material.

What If You Don't Have an eBook Reader?

If you don't have access to an ebook reader, you can read your ebooks on your PC, tablet, PDA, smartphone, or just about any other personal computing device in just about any format since every vendor provides free applications to read their ebooks on these devices.

For instance, to read an ePub or MOBI book on your PC, you can download and install the free Kobo application available from kobobooks.com. If you have purchased an ebook from Amazon.com, you can read it on your computer or smart phone using their free applications available on the Amazon.com site

Of course, there are other free applications available for all formats and platforms. You can easily find them with a quick Internet search.

What About PDF?

PDF is a classic and well-supported way to show a document on a computer screen. Virtually every Web browser and email program can display a PDF document, and you can usually print them to your own printer without any difficulty.

The problem with PDF is its inflexibility. If the document was created to fit nicely on a standard piece of letter-size paper, for instance, it will look great on that piece of paper but will be hard to read on a the 6-inch screen of an ebook reader or on the 3-inch screen of your phone or PDA because the entire page will be displayed in that small area, resulting in an unreadable tiny font size.

That's because PDF does not support reflowable text. "Reflowable" means that the text will dynamically wrap around as the screen size is reduced, without changing the font size.

Conversely, the ePub format does support reflowable text, so the text from an ePub book will adapt dynamically to the screen size or the window size as you stretch or shrink the window, or as you move from a full-size computer screen to a smaller eReader or PDA. By contrast, a PDF document is more like a photograph of a page. It's a great format if you want to make sure the appearance of the page remains exactly the way you designed it, with headers and images exactly where you intended to have them, but it is too inflexible for the typical ebook consumer.

Which Format to Choose?

If you are planning to create your own ebooks for sale, my suggestion is that you distribute it in all 3 main formats: ePub, MOBI and PDF. When prospects order your ebook, give them all 3 files to cover all situations.

Also, if you are planning to publish and market this material yourself through your own Web site, I recommend you forget about DRM restrictions. Unless your book becomes as popular as a Stephen King or J.K. Rowling novel, it's not likely to get pirated and passed around in any great volume even if it's unprotected. And keeping your book DRM-free will make it more attractive to potential buyers who will be less concerned about running into technical obstacles when transferring it to their reading device. Less worry means more sales.

Conversion Tools

Exporting to PDF

Perhaps the simplest way to create an ebook, and what is used by many self-published authors, is to write the book using a word processor and then export the file in PDF format.

As we have discussed earlier, this is fine if the target user is going to read your document on a fairly large screen, but is not well suited for smaller devices such as eReaders, PDAs and smart phones. While you can certainly reduce the page size in your word processor to generate a PDF that will display correctly on a small screen, this same document will look absurdly small on a standard computer screen.

In brief, PDF is a good option for prospects who will read your material on a full-size screen, but is not the best option if your target market will be using an eReader, a PDA or a phone to view your content.

Calibre

Calibre is a popular and free application to manage ebooks, including viewing them on-screen, uploading them to your eReader or other device, and converting them to other formats. Calibre is available for Windows, Linux and the Mac from calibre-ebook.com.

If you have written your material in a word processor and would like to produce an ebook suitable for eReaders and other small-screen devices, simply save your document in Rich-Text Format (RTF) using the "Save As..." feature of your word processor, and then convert it to ebook format (either MOBI or ePub) using Calibre. Simple as that!

Now For the Bad News...

While this is all nice and convenient, using a software tool to convert from a word processing document to an ebook format will almost always introduce some distortion in the formatting. Indents will end up being more or less pronounced than you had intended, headers may come out with too much or not enough blank space above and below them, and any fancy displays such as floating frames or sidebars might very well end up as a standard paragraph or be omitted from your document altogether.

The thing is, ebook formats are much less sophisticated than word processing formats and don't support a lot of fancy features. For this reason, the golden rule when creating an ebook is: Keep it simple.

Web-Based Converter

The people at Online-Convert.com offer a free Web-based service to convert your text or ebook to a different ebook format. For instance, you can upload a Microsoft Word document or a DRM-free ePub book and get back the MOBI version of it.

Again, don't expect miracles: conversion software can only give you a best-effort translation of your content since the capabilities of each format can be quite different. MOBI, specifically, has a very limited set of features. For instance, it does not support the "float" CSS directive, so if you had intended to show a picture on the right side of the screen with text flowing around it on the left, you will find that your MOBI document now shows the picture below that text, without the wrap-around effect.

Let's Get Technical!

The better way to produce an ebook is to write it natively, without the easy but imprecise conversion tools. This is admittedly a lot more technical, but it gives you complete control over the results, without any bloat or unpredictable output.

In the remaining of this document, we will examine how to create ebooks from scratch using a text editor rather than a word processor. We will see how to compile our ebooks in all 3 main formats: ePub, MOBI and PDF, without DRM restrictions.

This discussion is targeted to a technical audience who is familiar with entering Linux commands on the command line, as well as creating basic Web pages using HTML and CSS (Cascading Style Sheets).

 

 

Creating an ePub Book

A Brief Introduction to XHTML

The first thing you need to know about ebooks is that they are mostly based on XHTML, a page markup language very similar to HTML but with stricter syntax.

HTML ("Hyper-Text Markup Language") is the programming language used to create Web pages. If you are not familiar with creating Web pages in HTML, then you do not have the necessary background to create your ebooks manually as described in the remainder of this document. Very sorry. If this is the case, use the method outlined at the end of the previous chapter under "Conversion Tools," or learn the basics of Web design and come back to this document later. If you are familiar with HTML and CSS (Cascading Style Sheets) coding, then you are a qualified geek — read on!

XHTML is very similar to HTML but is very picky and unforgiving about syntax. For instance, while an HTML tag can be in either upper-case or lower-case, an XHTML tag must be lower-case or it won't be understood. For instance, the paragraph tag <P> will work in HTML but will be unrecognized in XHTML; you will have to use <p> instead.

Image tags must always include an "alt" attribute (a text description of the image), and every opening tag must have a closing tag, so this paragraph tag we have just mentioned would have to be matched with a closing </p> tag, as in this example:

<p>This is a legitimate paragraph in XHTML.</p>
<p>This one isn't, because it is missing the closing tag.

Similarly, every tag must be closed, even those that normally don't have a closing tag in HTML, such as <br>, <hr> and <img>. The way to close these tags it to escape the closing angle bracket with a forward slash, as in these examples:

<br />
<hr />
<img src="some_pic.jpg" alt="A yellow chicken"/>

Of course, this article is not meant to be a tutorial on XHTML, so I will leave it up to you to learn the details from other sources. For now, I will assume you are comfortable enough to get started writing your content from scratch. We will see later that there are useful tools to check your syntax and alert you of any errors.

Some Sample XHTML Pages for Your Book

However, just to get us started and in order to create a minimal ebook as an exercise, here are a few pages you can cut-and-paste on your system to get you started.

The following segments constitute a title page for our sample book, along with 2 very short chapters:

Title page:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Title Page</title>
</head>

<body>
   <div style="text-align: center;">
      <img src="cover.jpg" alt="Title page" />
      <p>MY FANTASTIC FIRST EBOOK</p>
      <p>by John Doe</p>
   </div>
</body>

</html>

Chapter 1:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Chapter 1</title>
</head>

<body>

<h1>How It All Began</h1>

<p>Once upon a time, a chicken decided to cross the road.</p>

</body>
</html>

Chapter 2:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Chapter 2</title>
</head>

<body>

<h1>The End of It All</h1>

<p>
After countless trials and tribulations, the chicken made
it across the road, but remained perplexed as to the motivation
for this journey.
</p>

<p>The End.</p>

</body>
</html>

And finally, here is a sample CSS file you might have created for your book. We will call it mybook.css:

body {
   margins: 5px;
   text-font: serif;
}

h1 {
   text-align: center;
   text-font: sans-serif;
}

As for the cover image, just pick any small JPEG image for your tests, copy it to your working directory under the name cover.jpg, and pretend it's a relevant illustration for your book.

The ePub Format

Since ePub files are really just a Zip archive of XHTML files and other documents, it's easy to examine the contents of an ePub book: simply rename the file with a .zip extension and then unzip it into an empty directory. Note that this will only work if the file is not protected with DRM. Fortunately, there are plenty of DRM-free ebooks on the Internet, many of them free of charge, that you can download to study and experiment with. One such source is Project Gutenberg, which stores tens of thousands of classic books that are both DRM-free and free of charge.

If you were to take apart an ePub document, you would generally find the following files in it:

mimetype
META-INF/container.xml
something.ncx
something.opf
One or more .html file(s)
One or more .css file(s)
Possibly some images (.jpg, .gif, etc.)

Some of these files may be located in a subdirectory named OEBPS or something else -- the standard is fairly flexible as far as this goes. (Incidentally, OEBPS stands for "Open EBook Publication Structure," a legacy ebook format that has been supeceded by ePub.) The file container.xml, however, must always be located in a directory named META-INF, while the file mimetype must always be located in the root directory of the structure.

To create your own ePub book, you simply need to create your content as an XHTML file (with either a .html or .xhtml extension), include your CSS code if you are using any in a file with a .css extension, and then create the remaining control files as specified below. When that's done, simply Zip the file into a single archive and rename it with a .epub extension. That's it; you're done.

To make this clearer, let's create a small ebook from scratch. Let's assume our book consists of a book cover which is a JPEG image, a title page and two chapters in XHTML format (title.html, chapter1.html and chapter2.html), and a set of Cascading Style Sheet definitions that you have stored in the file mybook.css.

Note that normally, an ebook will probably feature several chapters and possibly some appendices, but we are purposely keeping this example simple to keep our sample configuration files short and easy to understand.

We will now place all these files in a working directory, then create the special control files as specified below. To summarize, our working directory will include the following files:

mimetype
META-INF/container.xml
title.html
chapter1.html
chapter2.html
mybook.css
cover.jpg
toc.ncx
mybook.opf

mimetype

This file simply contains the following line:

application/epub+zip

IMPORTANT: This line must not end with a linefeed or carriage-return character. This generally means you can't use a standard text editor to create the file since the editor will usually terminate each line with a linefeed character. On a Unix-type system such as Linux or Mac OS X, you can create this file without the terminating linefeed using this command:

echo -e "application/epub+zip\c" > mimetype

The "\c" at the end of the string tells echo to omit the carriage-return character.

META-INF/container.xml

This is an XML document that tells the eReader (or equivalent reading software) where to find the book contents. It consists of an XML header, followed by a stanza named "<container>" which in turn contains an element named "<rootfiles>" which points to the .opf file that we will describe in the next section.

So, essentially, the contents of this file will always be as follows, with the name of your .opf file (shown in red) being the only thing that will change from one ebook to the next:

<?xml version="1.0" encoding='UTF-8'?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
      <rootfile full-path="mybook.opf" media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

The .ncx File

NCX stands for "Navigation Control file for XML." This is just fancy talk for "Table of Contents." The .ncx file contains information that makes it possible for the reading software or device to display a clickable table of contents. In fact, while this file can have any name you want (as long as it is correctly referenced in the .opf file described below), it is not uncommon for this file to simply be called toc.ncx, for "table of contents." In our examples, this is exactly what we are going to call it since this makes its role very clear.

The file starts with some name space information that you can simply transcribe from this example or from any other ePub document that you may have opened up. After that header, the file comprises the following elements:

  • A header with a number of "meta" elements
  • The book title
  • The author
  • A navigation map (i.e. the table of contents itself)

The header consists of various meta tags representing counters or identifiers. The most important tags are dtb:uid, which is a unique identifier; dtb:depth, the depth of the table of contents (i.e. how many levels); dtb:totalPageCount and dtb:maxPageNumber, which are two variables that must be initialized to zero in this header. Here is what the top of our toc.ncx file will look like (the parts you would modify are shown in red):

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE ncx PUBLIC '-//NISO//DTD ncx 2005-1//EN' 
   'http://www.daisy.org/z3986/2005/ncx-2005-1.dtd'>

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" 
   version="2005-1" xml:lang="en">

  <head>
      <meta name="dtb:uid" content="9780123456789" />
      <meta name="dtb:depth" content="1" />
      <meta name="dtb:totalPageCount" content="0" />
      <meta name="dtb:maxPageNumber" content="0" />
  </head>

In this example, we have assigned the eISBN as our unique identifier, which is common practice. If you have not registered an eISBN for your book or are not planning to, any unique string such as "my_novel" will do, so long as you use the same ID in the .opf file described in the next section. For documents that are available on the Internet, the URI (Universal Resource Identifier), such as "http://www.example.com/mybook.epub," is also a common way to assign a unique identifier to the book. Optionally, you can add the attribute opf:scheme="ISBN" or opf:scheme="URI" to the line, as appropriate, to help intelligent reading devices to dig out more information about your book. For our purposes here, I have omitted this option since we are trying to keep things as simple as possible.

Since our book contains only 2 chapters, the depth of our table of contents will be 1 (i.e. no sub-levels). The last two parameters will always be exactly as shown, so there is no need to change these lines.

The next two stanzas are <docTitle> and <docAuthor>, representing the book's title and author. These are self-explanatory, except perhaps for a mention that the strings must be bracketed between <text> tags, like this:

<docTitle>
  <text>My Fantastic First eBook</text>
</docTitle>

<docAuthor>
  <text>Doe, John</text>
</docAuthor>

And finally, we come to the meat of this file, the navigation map, a.k.a. table of contents, bracketed between <navMap> tags. Each item in the table of contents is indicated with <navPoint> tags, each of which contains two elements: the text to display (<navLabel>) and a link to the appropriate document (<content>).

In our case, the table of contents will have 3 items: our title page and the two chapters:

<navMap>
   <navPoint id="the_title" playOrder="1">
      <navLabel><text>Title Page</text></navLabel>
      <content src="title.html" />
   </navPoint>

   <navPoint id="part1" playOrder="2">
      <navLabel><text>How It All Began</text></navLabel>
      <content src="chapter1.html" />
   </navPoint>

   <navPoint id="part2" playOrder="3">
      <navLabel><text>The End of It All</text></navLabel>
      <content src="chapter2.html" />
   </navPoint>
</navMap>

You can see that each navPoint element features a unique ID and a value for "playOrder" to indicate where this item belongs in the table of contents. Because of this parameter, it is not necessary to list the items in the order in which they should appear in the table of contents; their order will be determined by the playOrder parameter instead. Again, text strings must be bracketed between <text> tags.

Here is our completed toc.ncx file:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE ncx PUBLIC '-//NISO//DTD ncx 2005-1//EN' 
   'http://www.daisy.org/z3986/2005/ncx-2005-1.dtd'>

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" 
   version="2005-1" xml:lang="en">

  <head>
      <meta name="dtb:uid" content="9780123456789" />
      <meta name="dtb:depth" content="1" />
      <meta name="dtb:totalPageCount" content="0" />
      <meta name="dtb:maxPageNumber" content="0" />
  </head>
  
  <docTitle>
    <text>My Fantastic First eBook</text>
  </docTitle>
  
  <docAuthor>
    <text>Doe, John</text>
  </docAuthor>
  
  <navMap>
     <navPoint id="the_title" playOrder="1">
        <navLabel><text>Title Page</text></navLabel>
        <content src="title.html" />
     </navPoint>
  
     <navPoint id="part1" playOrder="2">
        <navLabel><text>How It All Began</text></navLabel>
        <content src="chapter1.html" />
     </navPoint>
  
     <navPoint id="part2" playOrder="3">
        <navLabel><text>The End of It All</text></navLabel>
        <content src="chapter2.html" />
     </navPoint>
  </navMap>

</ncx>

The .opf File

OPF stands for "Open Package Format." This file can have any name of your choice, although it is customary to name it after the eISBN of your book, as in 9780123456789.opf. However, you can also call it something like "mybook.opf," so long as you specify the same name in container.xml.

While large ebooks will often store this file and the related XHTML documents in a subdirectory, this is not necessary. For the purpose of our examples, we will keep things simple and assume all files are on the same directory level except for container.xml which must be located in META-INF.

The .opf file define how the various components of your ebook fit together. It features a root element called "package" and four child elements:

  • metadata
  • manifest
  • spine
  • guide

The last element, "guide," is optional and somewhat redundant with the others, so we will omit it entirely. The first three elements are mandatory.

The metadata element must feature, as a minimum, the title of your publication, the language and a unique identifier that will be used elsewhere in the file. There are other optional attributes that can be specified, such as a description, the name of the author and the name of the publisher, but these three are mandatory.

Your .opf file will look like this:

<?xml version="1.0" encoding='UTF-8'?>
<package xmlns="http://www.idpf.org/2007/opf"
   unique-identifier="bookid" version="2.0">

   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:opf="http://www.idpf.org/2007/opf">
      <dc:title>My Fantastic First eBook</dc:title>
      <dc:identifier id="bookid">9780123456789</dc:identifier>
      <dc:language>en</dc:language>
   </metadata>

   ...

</package>

Note that the identifier used in the metadata element must be the same as the one specified in the package element ("bookid" in our example).

The manifest element is a list of all the files that constitute your ebook, such as your XHTML files, your CSS file(s), your image files if any, and the .ncx file which holds your table of contents. Essentially, every file in our ebook must be listed in the manifest except for mimetype, container.xml, and this .opf file itself.

Working from the sample ebook we described earlier, your manifest element would look like this (change the parts in red to suit your own preferences):

<manifest>
   <item id="title_pg" href="title.html"
      media-type="application/xhtml+xml" />
   <item id="part1" href="chapter1.html"
      media-type="application/xhtml+xml" />
   <item id="part2" href="chapter2.html"
      media-type="application/xhtml+xml" />
   <item id="my_css" href="mybook.css"
      media-type="text/css" />
   <item id="cover_pic" href="cover.jpg"
      media-type="image/jpeg" />
   <item id="my_toc" href="toc.ncx"
      media-type="application/x-dtbncx+xml" />
</manifest>

The identifiers you select to identify each file can be anything at all, so long as they are unique. They will be used in the next segment, the "spine."

The purpose of the spine element is to specify the sequential order in which the items in the manifest will appear in your book.

The "toc" attribute (Table Of Contents) to the spine element must point to the label you used to identify the .ncx file in the manifest("my_toc" in our example). The remaining records represent each component of your book in the order in which they are to be read.

In our case, the spine segment will include our title page first, followed by our two chapters:

<spine toc="my_toc">
   <itemref idref="title_pg" />
   <itemref idref="part1" />
   <itemref idref="part2" />
</spine>

Note: Each record in the spine segment is labeled itemref and corresponds to an item of the manifest. In addition, each record is labeled with an idref which corresponds to the matching id of the corresponding item in the manifest.

While each itemref in the spine must correspond to an item in the manifest, not all items in the manifest need to be included in the spine; only the readable XHTML items belong in this segment. Specifically, while your CSS file(s) and every individual image should be mentioned in the manifest, they will not need to be mentioned in the spine since they are not readable items.

Let's put the mybook.opf file together now:

<?xml version="1.0" encoding='UTF-8'?>
<package xmlns="http://www.idpf.org/2007/opf"
   unique-identifier="bookid" version="2.0">

   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:opf="http://www.idpf.org/2007/opf">
      <dc:title>My Fantastic First eBook</dc:title>
      <dc:identifier id="bookid">9780123456789</dc:identifier>
      <dc:language>en</dc:language>
   </metadata>

   <manifest>
      <item id="title_pg" href="title.html"
         media-type="application/xhtml+xml" />
      <item id="part1" href="chapter1.html"
         media-type="application/xhtml+xml" />
      <item id="part2" href="chapter2.html"
         media-type="application/xhtml+xml" />
      <item id="my_css" href="mybook.css"
         media-type="text/css" />
      <item id="cover_pic" href="cover.jpg"
         media-type="image/jpeg" />
      <item id="my_toc" href="toc.ncx"
         media-type="application/x-dtbncx+xml" />
   </manifest>
   
   <spine toc="my_toc">
      <itemref idref="title_pg" />
      <itemref idref="part1" />
      <itemref idref="part2" />
   </spine>

</package>

Assembling the ePub Book

Congratulations! You now have all the components needed to create the final ePub file. Our examples have illustrated the absolute minimum parameters needed to create a working ePub, and you will probably want to read up and experiment with additional parameters as you proceed with your work. However, for the sake of learning the basics, this was a good start.

Assuming you have created the above files, including your own content in XML format, you can assemble the book by simply creating a Zip archive of this entire directory structure.

There is one important rule, however: the file mimetype must be the first file in the archive, so you should list it first on the zip command line. For instance, to create a Zip archive named "mybook.epub" of your working directory into the parent directory, you would use this command on a Linux system:

zip -rX ../mybook.epub mimetype *

The -r stands for "recursive" and tells Zip to pick up all subdirectories, which is necessary in order to capture container.xml under META-INF. The -X option stands for "No Extras," which is necessary on Unix-style systems because Zip normally adds extra fields to store attributes such as modification times, and that invalidates the ePub format. The -X option tells Zip to omit these additional fields.

You now have an ePub book that you can upload to your eReader using the usual method (typically by USB or wireless file transfer), or you can view it directly on your PC using Calibre or any of the book-reading applications we mentioned earlier in this article.

Validating Your Format

There are two levels of validation you should run on your documents to be sure you have a valid ePub file: you should validate all your XHTML files to make sure they adhere to the stricter XHTML rules, and when that's done and you have assembled your ePub file, you should validate that file to make sure it conforms to the standard and will display correctly on all eReaders.

Fortunately, there are free tools available to do both.

Validing Your XHTML Code

Both Linux and Mac OS X feature the xmllint command to parse your XHTML documents and report any incongruities in them. This utility is part of the libxml package. If it is not already on your Linux or Mac system, you can probably download it and install it using the software installation tool appropriate to your distribution.

If you must use Windows, a quick Internet search for "xmllint for windows" should reveal a number of projects that have ported this utility to Windows.

With this utility installed on your system, you would simply invoke the program like this:

xmllint  filename.html

By default, this program lists your entire XHTML file to the standard output when it doesn't find any errors, which is not particularly useful and detracts from the results. If any errors are found, such as a missing closing tag, only the error messages are displayed.

To eliminate all the output when no errors are found, you can use the --noout option, like this:

xmllint --noout  filename.html

If no errors are found, you simply get your prompt back.

If you have written some shell scripts to automate the creation of your ebook, you can check the exit code from xmllint to determine whether the compilation was successful, as in this sample script:

for chapter in *.html
do
   if xmllint $chapter > /dev/null 2>&1
   then
      echo "$chapter is okay."
   else
      echo "ERRORS found in $chapter"
   fi
done

Validing Your ePub Structure

Once you have successfully checked all your XHTML code and have assembled your ePub book, you can check its structure using some free Web-based services or utilities.

Specifically, the site Threepress.com provides a free validation service whereby you simply upload your file and click a button to have their software check it out. They do not keep your work, they only check it.

Another option is to download and install the utility epubcheck, available from code.google.com/p/epubcheck.

Unless you have to validate dozens of file on a regular basis, the first option is probably more convenient.

 

 

Creating a MOBI eBook

Overview

Both the MOBI and ePub formats share some common ancestry, namely the Open Packaging Format (OPF). This is why both ePub and MOBI make use of a .opf file, although their contents are slightly different.

Both also compress and archive their consituent parts in Zip format, although the MOBI format adds extra material to that archive. In fact, if you were to take a MOBI ebook, rename it with a .zip extension, and then attempt to unzip it, you would first get a warning message about skipping a large amount of "extra bypes at beginning or within zipfile," then it would proceed to create a directory structure very familiar to that of an ePub book.

Specifically, you would find a .opf file somewhere, various .html files, and usually a .ncx file as well. If you were to examine these files, you would find that their contents are incredibly similar to those of their ePub counterparts, with only subtle differences.

In fact, the process to create a MOBI ebook is very similar to that of creating an ePub. Your content will be in XHTML files and you will have to construct a couple of control files to tell the MOBI compiler how to assemble them.

Some of the major differences between constructing an ePub and a MOBI book are:

  • There is no need for a mimetype file or a container.xml file.
  • For the book cover, MOBI uses a JPEG image directly instead of an HTML file pointing to the JPEG image through an <img> tag.
  • The table of contents is implemented through a standard HTML file in addition to a .ncx configuration file.
  • The .opf file contains a few new elements specific to the MOBI format.

So, using the same basic contents we used in our ePub project, here are the files we will find in our working directory when creating our MOBI ebook:

  • cover.jpg
  • title.html
  • chapter1.html
  • chapter2.html
  • toc.html
  • toc.ncx
  • mybook.css
  • mybook.opf

Note that the title page (title.html) will not include the book cover this time, since the MOBI format has a specific instruction for loading an image as the cover. However, we still want a title page to list the author, publication date, copyright note, and so on, so we will still create a title.html page for this purpose alone.

To assemble the above files into a MOBI ebook, we will be using a "MOBI compiler." Unlike the ePub format where we simply used Zip to assemble the book's components, the MOBI format requires us to use a proprietary utility. There are actually two such programs: a Windows-based graphical program called Mobipocket Creator, and a command-line utility named kindlegen that is available for Windows, Linux and Mac. In this tutorial, we will be using kindlegen.

Getting and Using Kindlegen

The kindlegen utility is available directly from the Amazon.com website.

Kindlegen is actually a pretty flexible utility. In its simplest form, kindlegen can be invoked against an existing ePub document or even a properly constructed XHTML page and will generate a .mobi file. For instance, if you have already created your ebook in ePub format, simply invoke kindlegen against it like this:

kindlegen mybook.epub

Within seconds, kindlegen will have generated a usable .mobi file with a similar name (mybook.mobi in this case) in your working directory.

Similarly, if you have created your content as a single HTML file, kindlegen can be run against it to generate a corresponding .mobi ebook.

Simple as that!

While this sounds wonderful, the bad news is that the resulting book would not be 100% functional. For instance, your Kindle device would not likely be able to produce a table of contents and some of the CSS features you had used in your ePub or HTML file will work incorrectly or not work at all.

So, to get a good-looking and properly functioning ebook, we will need to do some tweaking to our CSS code, as well as modifying our support files somewhat (the .opf and the .ncx files, specifically). Fortunately, much of what we have already covered in the previous chapter on ePub can be recycled here, so this process should be fairly straighforward.

Starting With Content

To start off, let's compose a title page and two sample chapters in XHTML. We will use the same samples as for the ePub book we created in the previous chapter, with a few minor changes.

Specifically, the title page will be very similar to the one we created in the previous section, except that we will not display the image of the book cover because MOBI supports a special instruction to assign an image to the book cover (which we will examine later). For now, here is our revised title page, suitable for our MOBI book:

Title page:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Title Page</title>
</head>

<body>
   <div style="text-align: center;">
      <p>MY FANTASTIC FIRST EBOOK</p>
      <p>by John Doe</p>
   </div>
</body>

</html>

Our two chapters will be exactly as they were in our ePub example. Here they are again:

Chapter 1:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Chapter 1</title>
</head>

<body>

<h1>How It All Began</h1>

<p>Once upon a time, a chicken decided to cross the road.</p>

</body>
</html>

Chapter 2:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Chapter 2</title>
</head>

<body>

<h1>The End of It All</h1>

<p>
After countless trials and tribulations, the chicken made
it across the road, but remained perplexed as to the motivation
for this journey.
</p>

<p>The End.</p>

</body>
</html>

And finally, we will need to create a file to store our CSS definitions. We will see shortly that the CSS for a MOBI book can be quite different than for an ePub, but our sample CSS file was simple enough that it will work in both cases. Here it is again:

mybook.css:

body {
   margins: 5px;
   text-font: serif;
}

h1 {
   text-align: center;
   text-font: sans-serif;
}

Great. We now have all our content. The next step is to create a table of contents for this masterpiece.

The MOBI Table Of Contents

Two TOCs!

This is where things get a little confusing. In a MOBI book, you actually have 2 separate files for the table of contents, each somewhat redundant with the other.

The table of contents that your readers will see must be created as an XHTML document which you can make as pretty and comprehensive as you wish. And then, you also need to create a .ncx ("Navigation Control for XML") file to allow the Kindle eReader to put tick marks at the appropriate places in the progress bar.

The Kindle displays a progress bar to let the reader know how far he/she has progressed in the book. Typically, the position of each chapter is indicated on this progress bar with a tick mark. The Kindle positions these tick marks using information in the .ncx file. Consequently, one immediate difference you will find in a MOBI ".ncx" file is that it can only document the top level of your table of contents due to the linear nature of a progress bar. In other words, the "depth" level must always be set to 1.

Conversely, in the free-form XHTML table of contents, you can specify as many levels as you wish and you can format that table exactly as you want (within the CSS limitations of the MOBI format, of course), which actually gives you more flexibility than the ePub format.

In the next two sections, we will examine the format of each file.

Creating the TOC in XHTML

Let's start with creating our formatted table of contents in XHTML. This is the document that your readers will see.

Our table of contents starts exactly like any other XHTML document with the appropriate name space definitions and the usual <html>, <head> and <body> tags. However, the items in our table are anchors pointing to labels that we will manually insert at the appropriate points in our content.

Here is what our toc.html file will look like for our 2-chapter book:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Table of Contents</title>
</head>

<body>
<h1>Table of Contents</h1>
<a href="title.html#ttl">Title Page</a><br /> <a href="chapter1.html#chap1">How It All Began</a><br /> <a href="chapter2.html#chap2">The End of It All</a><br />
</body> </html>

Each entry in the table is an anchor pointing to an XHTML file and to a label within this file. For instance, the anchor for Chapter 1 points to the file chapter1.html and to the label chap1 within this file.

These labels can be one of two things: Either a named anchor or a tag ID.

A named anchor is simply a standard <a> tag that uses the name= property to assign a name to it, like this:

<a name="chap1" />

A tag ID is just an "id=" propery placed inside any tag, like this:

<h1 id="chap1">Chapter 1: How It All Began</h1>

When a user clicks on one of our links in the table of contents, the eReader software will jump to either the named anchor or tag ID that bears that label. We can name these labels anything we want, as long as they are unique. To keep your document easy to understand, it makes sense to create labels that will be reasonably mnemonic. In our example, we have used "ttl" for the title page, and "chap1" and "chap2" to point to our chapters.

For our sample ebook, we will be using tag IDs and place them in the <h1> tags for each chapter. For the title page, we will place the tag ID in a division tag (<div>) instead since we are not using an <h1> tag there.

Note that if we had a more complex book, we might want to use a two-level table of contents, in which case we would be placing tag IDs in our <h2> tags as well.

So, after adding the needed tag IDs to our content, our modified title page and chapters now look like this:

Title page:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Title Page</title>
</head>

<body>
   <div id="ttl" style="text-align: center;">
      <p>MY FANTASTIC FIRST EBOOK</p>
      <p>by John Doe</p>
   </div>
</body>

</html>

Chapter 1:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Chapter 1</title>
</head>

<body>

<h1 id="chap1" >How It All Began</h1>

<p>Once upon a time, a chicken decided to cross the road.</p>

</body>
</html>

Chapter 2:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
   <link rel="stylesheet" type="text/css" href="mybook.css" />
   <title>Chapter 2</title>
</head>

<body>

<h1 id="chap2" >The End of It All</h1>

<p>
After countless trials and tribulations, the chicken made
it across the road, but remained perplexed as to the motivation
for this journey.
</p>

<p>The End.</p>

</body>
</html>

The .ncx file

As mentioned at the beginning of this section, the MOBI format uses a separate .ncx file to track the position of the top-level chapters in the progress bar on the Kindle.

This file will have the same format as its ePub counterpart, except that we will never have nested navPoints since only the top level of our table of contents can be referenced here. This also means that dtb:depth will always be set to 1.

Again, we can call this file with any name we want, so long as it is referenced correctly in the .opf file that we will examine next. For our project, we will simply call it toc.ncx.

Since the format is identical to what we have already covered in the ePub section of this tutorial, we will not repeat all the details here. Let's just show the file again for convenience:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE ncx PUBLIC '-//NISO//DTD ncx 2005-1//EN' 
   'http://www.daisy.org/z3986/2005/ncx-2005-1.dtd'>

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" 
   version="2005-1" xml:lang="en">

  <head>
      <meta name="dtb:uid" content="my_first_mobi" />
      <meta name="dtb:depth" content="1" />
      <meta name="dtb:totalPageCount" content="0" />
      <meta name="dtb:maxPageNumber" content="0" />
  </head>
  
  <docTitle>
    <text>My Fantastic First eBook</text>
  </docTitle>
  
  <docAuthor>
    <text>Doe, John</text>
  </docAuthor>
  
  <navMap>
     <navPoint id="the_title" playOrder="1">
        <navLabel><text>>Title Page</text></navLabel>
        <content src="title.html" />
     </navPoint>
  
     <navPoint id="part1" playOrder="2">
        <navLabel><text>How It All Began</text></navLabel>
        <content src="chapter1.html" />
     </navPoint>
  
     <navPoint id="part2" playOrder="3">
        <navLabel><text>The End of It All</text></navLabel>
        <content src="chapter2.html" />
     </navPoint>
  </navMap>

</ncx>

In the sample file above, the only thing we have changed from our ePub sample is the unique ID; we are using a string this time ("my_first_mobi") instead of the ISBN number. This is to stress that the ID can be anything at all, as long as we reference it correctly in the OPF file.

We have also highlighted the depth level of "1" in red to stress the fact that it must always be set to this value in a MOBI book.

The OPF File for MOBI

Okay, we're almost done. We have our content in three HTML files (title.html, chapter1.html and chapter2.html), we have our CSS code in mybook.css, our cover in a JPEG image, and our table of contents in toc.html and toc.ncx.

All we need now is an OPF file (mybook.opf) to tell kindlegen what goes where. The format is similar to that of the .opf file we created for the ePub book, with only minor changes related to the book cover and the table of contents.

Our MOBI-style .opf file will has the same components as the ePub version, such as <metadata>, <manifest> and <spine>, but it also features the <guide> stanza, which we had left out of the ePub version since it was optional. In the MOBI version, the <guide> section is used to point to the table of contents, making it possible for the reading device to create a button or menu selection pointing to it.

The second difference is that our cover page will now be specified with a <metadata> item pointing to the JPEG image of our book cover and referenced in the <manifest> section.

Here is what our .opf file will look like (changes are highlighted in red):

<?xml version="1.0" encoding='UTF-8'?>
<package xmlns="http://www.idpf.org/2007/opf"
   unique-identifier="bookid" version="2.0">

   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:opf="http://www.idpf.org/2007/opf">
      <meta name="cover" content="the_cover" />
      <dc:title>My Fantastic First eBook</dc:title>
      <dc:identifier id="bookid">my_first_mobi</dc:identifier>
      <dc:language>en</dc:language>
   </metadata>

   <manifest>
      <item id="the_cover" href="cover.jpg"
         media-type="image/jpeg" />
      <item id="title_pg" href="title.html"
         media-type="application/xhtml+xml" />
      <item id="part1" href="chapter1.html"
         media-type="application/xhtml+xml" />
      <item id="part2" href="chapter2.html"
         media-type="application/xhtml+xml" />
      <item id="my_css" href="mybook.css"
         media-type="text/css" />
      <item id="my_toc" href="toc.html"
         media-type="application/xhtml+xml" />
   </manifest>
   
   <spine toc="my_toc">
      <itemref idref="title_pg" />
      <itemref idref="part1" />
      <itemref idref="part2" />
   </spine>
<guide> <reference type="toc" title="Table of Contents" href="toc.html" /> </guide>
</package>

The XML and doctype declarations at the top remain the same, and the <metadata> segment is also identical to our ePub version except for the addition of the new "meta" record. The attribute name="cover" in this new record tells the reading device that the item from the manifest with the ID specified by content= is to be used as the book cover. In turn, the manifest record with that identifier ("the_cover") tells the reading device where to find that image.

The <manifest> segment no longer features the table of contents (toc.html), which is now listed in the <guide> section instead.

Compiling the Book

Now that we have all the necessary components of our ebook, the final step is to run kindlegen against the OPF file, like this:

kindlegen mybook.opf

Assuming there are no errors or missing components in any of those files, kindlegen will generate a MOBI ebook using the first part of the filename from your OPF file and a .mobi extension (mybook.mobi in this example).

That's it; we're done creating the book. You can now upload it to a Kindle or a Palm PDA, or just view it on your computer using an ebook reading application such as Caliber or the Kindle app.

Making It Look Good

While creating the MOBI book is pretty straightforward once all the control files have been correctly constructed, making it look good is the real challenge. For instance, when viewing your new ebook in the Kindle application or eReader, you may find that your text is fully justified instead of ragged, and that paragraphs are indented even if you have not specified that in your CSS code. You will also probably find that spacing around headers is not what you had intended or that special effects such as background colours or borders may be gone.

This is where you realize that a lot of the standard CSS properties that you would expect to be implemented in a standard Web page are simply not supported in MOBI. You will also soon realize that complex CSS is often misinterpreted or ignored in the MOBI format.

Here are a few tips and guidelines for tweaking your CSS for a MOBI ebook:

CSS not consistent in all tags

Some CSS instructions will work in some tags but not in others. For instance, while you can assign a background colour to a paragraph, the same assignment will fail on a presentation tag (<pre>).

<p style="background-color: gray;">   (Works as expected)
<pre style="background-color: gray;">   (Won't work!)

The solution, in some cases, is to bracket the selection you want to format between <div> tags, like this:

<div style="background-color: gray;">

<pre>
Your text goes here...
</pre>

</div>

Borders

Borders often don't work as you would expect. For instance, you might want your top-level headers (<h1>) to display with a horizontal line across the bottom, looking like this:

Top-level header

Normally, you would do this by specifying a border to the <h1> element, like this:

h1 {
   border-bottom: solid 1px;
}

However, in MOBI, this will not work. Instead, you will have to use an <ht> tag to draw a horizontal line below each header.

Margins and padding

Another frustrating limitation of MOBI is that you cannot easily specify the amount of whitespace you want to leave around your text. For instance, the padding CSS instruction seems to have no effect whatsoever on anything. The margin instruction works somewhat, although imprecisely, but may be usable as a substitute for padding in some cases.

If you really must pad an item, you will have to resort to inserting line breaks (<br />) or non-breaking spaces (&nbsp;) where appropriate, which is an ugly kludge but will work.

Page breaks

To force a page break in a MOBI ebook, you need to use this custom, non-standard tag: <mbd:pagebreak />

It is advisable to insert a <mbd:pagebreak /> tag before the header of each chapter or major section of your book.

Other restrictions

Here are various other restrictions to keep in mind:

  • Generally, dimensions should be specified in pixels (px) instead of ems for MOBI documents. Among other reasons, the MOBI format does not support fractions of an em, as in 0.5em.
  • Large tables that cross pages may display unpredictably and differently from one reading device to another. As a general rule, tables should be kept short and should be used only for text data. Nested tables are not supported.
  • If it is critical for an item to be displayed exactly as intended (such as a table or quoted material such as poetry), then an image (JPEG, PNG or GIF) should be used.
  • There is no practical way to implement an decorative "drop-cap" letter at the start of a chapter. To simulate this effect, simply use a <span> tag to increase the size of the first letter and make it bold — the effect is very similar to that of a real drop-cap.
  • Images cannot be larger than 63KB in size and vector images are not supported. JPEG, GIF and PNG formats are supported.

The full reference site

The MOBI format is maintained by Mobipocket at www.mobipocket.com/dev. For the full documentation on this format, including tips for authors, surf to that site.

 

 

Creating a PDF Document

Do we really need a chapter in this book to explain how to create a PDF document? After all, most word processors feature an Export As... option to save your word processing document in PDF format, so why not use that?

In fact, that option is certain convenient and will usually generate a very fine version of your document in PDF format. However, if you intend to distribute your work in all 3 major formats (ePub, MOBI and PDF), then you will have already created your source document in HTML, so you probably won't want to have to maintain a second copy of that document in your word processor. Indeed, keeping two copies of the same document in sync can be a major headache.

For this reason, I recommend using an HTML-to-PDF convertor so that you can maintain a single version of your document in HTML, or more precisely, in XHTML. The tool I use is a free GPL utility named wkhtmltopdf which is available for Linux, Windows and Mac. This odd-looking name stands for "WebKit HTML to PDF." If you are running Linux, you can probably install it from your distribution media or from your usual repositories; if you are running Windows or Mac, a quick Internet search should locate a version for your environment.

This program reads a standard XHTML file and will produce a clean PDF document, including a table of contents, page headers and footers, page numbers, etc., based on which options you specify on the command line. This is not meant to be a thorough tutorial on wkhtmltopdf, but rather just an introduction to point you in the right direction.

For instance, here is how I invoke the program to generate a PDF file that can be printed as a boldable booklet on letter-size paper:

whhtmltopdf \
  --book \
  --toc-depth 1 \
  --footer-right "[page]" \
  --footer-spacing 6 \
  --header-center "My Fantastic First Ebook" \
  --header-left "" \
  --header-right "" \
  --header-spacing 6 \
  --cover show_cover.html \
  --page-width 140 \
  --page-height 215 \
   mybook.html outputfile.pdf

Let's examine each option in sequence. Note that all measurements are in millimeters. The --book option is an alias for several options that are typically desired for a book format. The --toc-depth option specifies that the table of contents should have only one level. The next two options specify that the page number should be shown in the footer of each page, and that a 6-mm space should be left between the text and the footer.

The next 4 options define what the header should look like. The header is comprised of a left, center and right side, which you can populate with any text of your choice or leave blank. In this example, we have chosen to have the book title in the centre of the header at the top of each page, with nothing on the left or right.

As you can see with the next option (--cover), we can specify a separate HTML document to be used as the book cover. And finally, the last two options specify the page width and height (in mm), which correspond to letter-size paper.

The last two parameters are the source file (mybook.html) and the desired output file for our PDF document (outputfile.pdf).

There are many more options to wkhtmltopdf; read the manual pages for all the details.

Conclusion

We have just covered how to create an ebook in ePub, MOBI and PDF format using a single source document in XHTML with minor variations to the accessory files used in generating the ebooks, such as slightly different CSS definitions and OPF files where appropriate.

The PDF format will provide the best and most accurate visual rendition of your book since pretty much all CSS properties will be supported, so you can make your document as pretty and complex as you wish in this format. Unfortunately, since PDF is not a reflowable format, the page layout will be fixed and may not be appropriate for smaller screens.

For small devices such as eReaders, phones and PDAs, the ePub and MOBI formats are more appropriate since your text will wrap around dynamically to fit any screen size without reducing the font size. The flip side of this coin is that you don't have as much control over the page layout since you are relegating much of this control to the device. In addition, these ebook formats only support a subset of the CSS properties supported by Web browsers, so you are quite limited in how fancy you can make your content. For best results, the golden rule is "keep it simple."

This is even more important for the MOBI version of your ebook since the MOBI format is even more restricted in its feature set than the ePub format.

The good news is that you can write and maintain your content in a single set of XHTML files, and use the same set of source files to generate all 3 types of ebooks using different cascading style sheets (CSS) appropriate for each one.

While the task of setting up your first ebook may seem daunting right now, keep in mind that once you have created your first ebook, you can use these initial templates for all your subsequent books, so this task is well worth the initial effort.

--

 

 


Did you find an error on this page or do you have a comment?

Services
Sponsors