Joe Kissell 21 March 2011

Joe Kissell Shreds an Ebook into Twitter

If you read TidBITS regularly, you’re probably aware of my ebook “Take Control of Your Paperless Office,” which was published in November 2010. Among the things I explain in the ebook is how you can scan paper documents and then recycle or shred the originals. But how do you shred an ebook? It’s easy: rip it into 140-character strips and feed it to Twitter! And so, I’m doing exactly that.

Adam and Tonya and I came up with this idea a few weeks ago during a conversation about how to have some fun spreading the word about the Take Control ebooks. Even though this experiment amounts to giving away the entire text of the book for free (if you really want to read the whole thing on Twitter), we figured that most people who become interested in the ebook because of this project would opt to buy a copy, thus getting the vastly improved readability of a proper book layout — not to mention access
to future upgrades. To sweeten the deal, Take Control is offering a 30 percent discount on the book to everyone who follows it on Twitter!

So, it works like this: Starting at 9 AM Pacific/12 PM Eastern on 21 March 2011, the book is being tweeted, 140 characters (or so) at a time on the account @zapmypaper. I decided on a frequency of one tweet every 15 minutes, at which rate it’ll take 17 days to tweet the entire 118-page book. From time to time, I’ll intersperse reminders about how to purchase the full ebook, suggestions to follow me on Twitter (@joekissell), and other tips. Interstitial messages that aren’t part of the book’s text will start with ==. For example:

== Tip: Support author Joe Kissell’s baguette and cheese habit. Buy this ebook at http://bit.ly/h2mPrl

Ideally, I’d like to refrain from sending manual tweets, @replies, and retweets from the @zapmypaper account during the period that the book is being tweeted, simply because that’ll keep the stream cleaner and make it easier to follow. However, if I do need to insert any special announcements or other tweets, I’ll use the same == convention at the start so you can tell they’re not part of the book itself.

After the entire book has been tweeted, my intention is to continue using the @zapmypaper account for news, suggestions, and questions about maintaining a paperless office, so those who follow that feed can get ongoing support for their paperless habit.

Tweeting with Style — If I were tweeting a novel or some other book that contained only plain text to begin with, the process would have been much simpler. But this being a how-to book on a somewhat technical topic, it included lots of elements that don’t directly translate into text. My goal was to preserve as much of the structure of the book as I reasonably could, striking a balance between faithfulness to the original and readability within Twitter.

I think I did a reasonably good job at that, although the conversion was slightly lossy — that is, one could reassemble the tweets into a rough approximation of the original book, but a few elements (and a tiny bit of text) wouldn’t come through. For those who are curious, as well as for the geeks who will inevitably write scripts to recreate the book despite my disclaimers, here’s what I did.

Headings: The ebook uses different font sizes, weights, and colors to indicate various heading levels. In Twitter, I’ve converted them as follows:
- Heading 1: **HEADING TEXT** (all caps, two asterisks on each side)
- Heading 2: *HEADING TEXT* (all caps, one asterisk on each side)
- Heading 3: HEADING TEXT (all caps)
- Heading 4 and below: _Heading Text_ (title case, underscores on each side)
Sidebars: I enclosed the contents of the sidebar, including its title, in double square brackets, like so: [[sidebar text]]
Tips, Notes, Warnings, etc.: I enclosed the contents in single square brackets, like so: [tip text] (and note that the Twitter version doesn’t distinguish between the different visual styles we used for tips/notes and those we used for “emphatic” paragraphs).
Paragraph breaks: I debated whether to do anything at all to represent paragraph breaks and if so, how. After careful deliberation and testing, I settled on using the paragraph symbol (¶) to denote the beginning of each new paragraph. I decided against using actual line feeds because some Twitter clients don’t display them, and neither does the Twitter Web site — so some readers wouldn’t have been able to tell where new paragraphs began. (And, when you’re reading an entire book, I think that’s pretty important!)
Graphics: The book contained a handful of screenshots; I’ve uploaded these separately and embedded links in the text.
URLs: I converted nearly all the external links in the book (except for a few very short ones) to bit.ly URLs for compactness.
Footnotes: The book had only two footnotes, and for the purpose of this project I relocated their contents into the main body of the text.
Character styles: Boldface, italics, colors, special fonts, and other modified character styles are simply gone.
Tweet length: Obviously every tweet must fit within 140 characters, but I went a bit further to ensure that tweets always break at word boundaries, and never end with, for example, a paragraph mark or bullet character. Also, I discovered that the Python script I use to send the tweets incorrectly counts certain symbols (such as •, ¶, and —) as more than one character, and rather than spend a lot of time trying to rework the script, I opted to simply leave a few extra characters free to accommodate these symbols on lines that contain them. The result is that some of the tweets will appear to be shorter
than they strictly need to be.

Beyond character styles, several other portions of the text didn’t make the transition to Twitter at all:

The table of contents (because suggesting that you jump to the text on Thursday, March 31st at 4 PM EDT is just silly)
Portions of the front and back matter that make sense only in the original ebook form
Inline graphics (there were just a few of these)
Captions for the screenshots
Bookmarks and internal navigational links

How I Did All This — I don’t imagine a whole lot of people are going to want to go out and start tweeting their own books, but in case you’re curious how I pulled this off, here’s a quick overview.

I began by taking the Word file containing the complete text of the book and adapting it, using a series of Find and Replace operations to convert things like headings and sidebars into a format that would make sense in plain text (as described above). Once I’d gotten to the point where the file no longer had any data that was dependent on text styles, I moved it over to BBEdit, whose much more powerful grep-based Find and Replace feature, along with Text Factories that combine multiple text-manipulation actions into a single command, enabled me to do all the remaining conversion. I converted real paragraph breaks to ¶ characters, making the whole book one long line. I then set the window to wrap
at 140 characters and did a quick visual scan to look for any line-break issues my automated procedure missed (and there were a number of these). Once I had everything the way I wanted it, I used the Text > Add Line Breaks command to put hard returns back at the end of each line. The result: the entire book as a single text file, formatted to be tweet-friendly, with each line corresponding to a single tweet. As a final step, I stuck in some announcements at appropriate intervals.

To send the tweets, I installed the Python Twitter wrapper and its various dependencies on my Mac, and then found a simple tweet_textfile script contributed by user “cydeweys” to go through a text file line by line, sending each line out sequentially as a tweet at the interval of my choice (which happened to be every 15 minutes). Actually, the tweet_textfile script was a bit old, and used an obsolete Twitter API, so I had to update it to use OAuth, but the basic logic of the script remained unchanged.

During a test run of the script, I woke up one morning to discover that the tweeting had stalled overnight. Apparently, due to a fleeting outage on Twitter’s side, one post wasn’t acknowledged the way the script expected, and even though it had some built-in error checking, the precise timing of the glitch produced no actual error message (which would have prompted the script to try again in a couple of minutes) and instead resulted in the script hanging indefinitely. I didn’t have the time to think through all the ways the API interactions could fail and update the script to be as robust as it could be, so instead I cheated a bit, by using launchd to poke the Python script periodically. It’s not elegant, but it works.

Of course, there was more to it than that; I’m yadda-yadda’ing a bunch of steps such as registering a Twitter application, getting the necessary API keys and whatnot, and loads of testing and bug fixing. But essentially it boiled down to converting the document to plain text, splitting it into 140-character lines, and using a Python script to send out the tweets. Easy. Or at least fun, in a tremendously geeky way.

Note: Much of this article was taken from a post on my Web site that describes the project and serves as a landing page for those who want to know more about it but may not already be familiar with the book, or with the Take Control series in general.

Comments About Joe Kissell Shreds an Ebook into Twitter

Michael Cohen
21 March 2011

For retro-readers, reading your Twitter stream from top to bottom allows you to read Joe's book in reverse order! ☺
- Joe Kissell
  21 March 2011
  
  Or it will, once the whole book has been tweeted. Until then, dizziness could result.
CFRandall
22 March 2011

Of course, given the the book's content, you need to buy a $700 twitter client to be able to read it.

Share

Subscribe today so you don’t miss any TidBITS articles!

Comments About Joe Kissell Shreds an Ebook into Twitter