This article originally appeared in TidBITS on 2013-03-28 at 1:07 p.m.
The permanent URL for this article is:
Include images: Off

Avoid and Fix Word Document Corruption

by Adam C. Engst

The main reason we switched from Microsoft Word for the Mac to Apple’s Pages for writing Take Control books was that Pages has support for EPUB export, and its PDF export was superior to Word’s. Another small reason for the switch was concern with occasional document corruption, which would always hit at an inopportune time. Since our documents were long and complex, with some breaking the 200-page mark, we learned to avoid certain Word features.

For example, we found that automated cross-references often caused corruption in our Word (.doc) files, and we eventually banned their use in Take Control manuscripts. We also developed specific ways of working to reduce the impact of a corrupted document. Before opening a file, each of us would make a copy in a separate folder, and increment a version number in the filename, making it easy to revert to a previous version should corruption crop up.

Even though Word document corruption is no longer a concern for our production process, I noticed that a recent discussion [1] on the Office for Mac forum offers two useful pieces of advice for those who do worry about these problems: a list of best practices for avoiding corruption from MVP John McGhie [2] and a technique for removing corruption if it happens.

John’s best practices include:

What if your Word document is already showing signs of corruption? A technique called “doing a Maggie” (named for Margaret Secara from the TECHWR-L mailing list, who first publicized the technique) can help. Follow these steps:

  1. Create a new, empty document in the .docx format.

  2. In your corrupted document, display paragraph marks (¶); there’s usually a button you can click to do so, or try the Command-8 shortcut.

  3. Click at the very beginning of the corrupted document to set the insertion point there, scroll to the end of the document, hold down the Shift key, and click again just before the last paragraph mark in the document. (Various document attributes are stored in that last paragraph mark, so it’s a place where corruption can lurk.)

  4. [image link] [3]

  5. Copy the selected text, switch to the new document, paste the text, and save with a new name.

If that doesn’t work, particularly with a long document, make a backup and then try copying just the first half of the corrupted document out to a new document. If that new document seems fine, copy subsequent halves of what remains in the corrupted document, until you isolate the problem. (If the problem still exists, try the other half first.) At that point you can step back, extract large portions of the original document around the corruption and reunite them in a new document. The concept here is the same as the old “binary search” method of isolating extension conflicts in the classic Mac OS — turn half of the extensions off, and if the Mac boots properly, enable half of the remaining extensions, repeating as necessary until the culprit is found.

In a worst-case scenario, where these techniques don’t help, we’ve sometimes had luck with saving as RTF, and then opening that document and converting the RTF file back to Word format. Some aspects of the document may be lost, but if it’s either that or saving as plain text and losing all style information, RTF is the lesser of the weevils.