The main reason we switched from Microsoft Word for the Mac to Apple’s Pages for writing Take Control books was that Pages has support for EPUB export, and its PDF export was superior to Word’s. Another small reason for the switch was concern with occasional document corruption, which would always hit at an inopportune time. Since our documents were long and complex, with some breaking the 200-page mark, we learned to avoid certain Word features.
For example, we found that automated cross-references often caused corruption in our Word (.doc) files, and we eventually banned their use in Take Control manuscripts. We also developed specific ways of working to reduce the impact of a corrupted document. Before opening a file, each of us would make a copy in a separate folder, and increment a version number in the filename, making it easy to revert to a previous version should corruption crop up.
Even though Word document corruption is no longer a concern for our production process, I noticed that a recent discussion on the Office for Mac forum offers two useful pieces of advice for those who do worry about these problems: a list of best practices for avoiding corruption from MVP John McGhie and a technique for removing corruption if it happens.
John’s best practices include:
- Always run the latest version of Microsoft Office — John says that Word 2011 won’t necessarily even run on OS X 10.8 Mountain Lion without all the latest updates. To that I would add that it’s always worth waiting a week or so on updates, since quick follow-ups to fix newly introduced bugs in large software packages are becoming all the more common.
- Never use Track Changes. John is adamant about this, but it’s something I’d never heard before. We always used Track Changes, it being one of Word’s most useful features for collaborative editing, but the only problem we ever associated with it was sluggishness in documents with extensive tracked changes. Instead, John suggests relying on Compare Documents after the fact (find it in Tools > Track Changes > Compare Documents), which gives the same result safely (though you may find it more difficult to work with — we certainly did). For what it’s worth, we rely heavily on Track Changes in Pages too, and haven’t seen corruption issues there.
Don’t apply direct formatting (bold, italic, font changes, etc.). Instead, define named character and paragraph styles and rely entirely on them. I’ve not heard this advice before, but it makes sense, given how Word stores formatting information in paragraph marks following each paragraph and at the end of the document. Better yet, a properly styled document is much easier to work with if you want to make wholesale style changes or import it into another application, like Adobe InDesign. We relied almost entirely on named styles in Word, though we applied some direct styling, like bold, by hand.
Never use drag-and-drop for editing, and instead rely on cut and paste. John notes that he has trouble avoiding drag-and-drop editing, since it can be extremely convenient, and it’s a shame that such direct manipulation can cause trouble.
Use only the modern .docx format, and save older .doc files to .docx. The XML-based .docx format can describe and store aspects of a document that are impossible in .doc, so saving in .doc format can remove information from your document. We were doing this “wrong” too, because we worked with too many people who hadn’t upgraded to versions of Word that could use the .docx format. Nowadays, there’s little excuse for not using .docx.
What if your Word document is already showing signs of corruption? A technique called “doing a Maggie” (named for Margaret Secara from the TECHWR-L mailing list, who first publicized the technique) can help. Follow these steps:
- Create a new, empty document in the .docx format.
In your corrupted document, display paragraph marks (¶); there’s usually a button you can click to do so, or try the Command-8 shortcut.
Click at the very beginning of the corrupted document to set the insertion point there, scroll to the end of the document, hold down the Shift key, and click again just before the last paragraph mark in the document. (Various document attributes are stored in that last paragraph mark, so it’s a place where corruption can lurk.)
Copy the selected text, switch to the new document, paste the text, and save with a new name.
If that doesn’t work, particularly with a long document, make a backup and then try copying just the first half of the corrupted document out to a new document. If that new document seems fine, copy subsequent halves of what remains in the corrupted document, until you isolate the problem. (If the problem still exists, try the other half first.) At that point you can step back, extract large portions of the original document around the corruption and reunite them in a new document. The concept here is the same as the old “binary search” method of isolating extension conflicts in the classic Mac OS — turn half of the extensions off, and if the Mac boots properly, enable half of the remaining extensions, repeating as necessary
until the culprit is found.
In a worst-case scenario, where these techniques don’t help, we’ve sometimes had luck with saving as RTF, and then opening that document and converting the RTF file back to Word format. Some aspects of the document may be lost, but if it’s either that or saving as plain text and losing all style information, RTF is the lesser of the weevils.