This article originally appeared in TidBITS on 2017-01-02 at 8:26 a.m.
The permanent URL for this article is: http://tidbits.com/article/16966
Include images: Off

Sierra PDF Problems Get Worse in 10.12.2

by Adam C. Engst

One of the first problems with macOS 10.12 Sierra revolved around PDFs created by Fujitsu’s ScanSnap scanners (see “ScanSnap Users Should Delay Sierra Upgrades [1],” 20 September 2016). Those problems turned out to be less severe than initially feared (see “ScanSnap Conflicts with Sierra Easily Avoided [2],” 3 October 2016), and Apple resolved them in macOS 10.12.1 (see “macOS 10.12.1 Sierra, watchOS 3.1, and tvOS 10.0.1 Mostly Fix Bugs [3],” 24 October 2016). Now, however, it seems Sierra’s PDF-related problems go deeper, and you should exercise caution when editing PDFs with Preview.

The first I heard that Sierra’s PDF-related problems might affect more than ScanSnap scanners came in a comment left on one of those articles on 26 October 2016. Developer Craig Landrum, who founded the document management system company Mindwrap [4], said:

The primary problem with Sierra with respect to PDFs is that Apple chose to rewrite the PDFKit framework in macOS 10.12 and it broke a number of things that PDF-related developers relied upon (I write scan-to-PDF software and know other developers who were impacted). Software that uses third-party PDF libraries probably runs fine, but those of us in the development community who relied upon Apple’s PDFKit library were really slammed — and we have no way to fix the problems ourselves. There have been numerous bug reports sent to Apple on the several serious issues found with PDFKit and we hope Apple addresses them in an upcoming point release.

Since Craig Landrum’s comment came after the release of 10.12.1 and the fixes for ScanSnap, I filed his criticism of PDFKit away as something that likely had been true but was no longer. However, throughout the next few months, additional complaints kept surfacing. Eric Bönisch-Volkmann, head of DEVONtechnologies, told me that they’ve spent a significant amount of development time working around Sierra’s PDF-related bugs in DEVONthink [5]. Christian Grunenberg, DEVONthink’s lead developer, characterized the rewritten version of PDFKit in Sierra as “a work in progress,” saying:

Apple wants to use a common foundation for both iOS and macOS. However, it was released way too early, and for the first time (at least in my experience) Apple deprecated several features without caring about compatibility. And to make things worse, lots of former features are now broken or not implemented at all, meaning that we had to add lots of workarounds or implement stuff on our own. And there’s still work left to be done.

10.12.2 introduces new issues (it seems that Apple wants to fix at least the broken compatibility now) and of course fixed almost none of the other issues. It’s not only DEVONthink — a lot of other applications (such as EndNote, Skim, Bookends, and EagleFiler) are also affected.

In fact, Michael Tsai, developer of EagleFiler [6], just published a blog post confirming his problems [7] with PDFKit:

I ran into a lot of PDF bugs in macOS 10.12.0. None have been fixed, as far as I can tell, and I’ve already filed two Radars for new issues in 10.12.2. It’s sad that basic functionality remains broken for so long — especially given that PDF was an area where Apple used to excel.

More concerning, and this is what finally pushed me to track down all these reports and write this article, is that the recently released macOS 10.12.2 has introduced a serious new bug related to PDFKit. Brooks Duncan of the DocumentSnap [8] site published a note from one of his readers [9] that warns that the OCR text layer added to scanned PDFs by Fujitsu’s ScanSnap software will be deleted if you edit the PDF in Preview. Eric Bönisch-Volkmann confirmed this, saying ruefully:

10.12.2 fixes a few bugs but kills the OCR text layer in PDFs. We worked around the earlier bugs in DEVONthink 2.9.8 and will address 10.12.2’s new problems in the upcoming 2.9.9. But yes, as soon as you edit a PDF in Preview the text layer is gone. Our customers are delighted.

Although the DocumentSnap reader said that the problem didn’t affect PDFs scanned and OCRed with other solutions, Brooks Duncan was able to reproduce the problem with scans made from both ScanSnap and Doxie scanners; he noted that both rely on the ABBYY FineReader [10] engine.

Sonny Software’s Jon Ashwell, developer of the Bookends [11] bibliography app, expressed significant frustration as well, saying:

We’ve been trying very hard to work around perfectly good code that was broken in Sierra. Versions 10.12.0 and 10.12.1 were bad, but 10.12.2 was a disaster for us, causing Bookends to crash when displaying PDFs with annotations. We’ve worked around that, but in the process had to shut down PDF annotations while we look for workarounds. I’ve filed a number of radars with Apple, two of which were closed as duplicates. In another case, I was asked to provide our app, but after doing so there has been only silence. I’ve never seen such a sorry case of sloppy code and indifference from Apple.

Problems with PDF annotations have plagued other developers as well, to judge from irate posts in Apple’s developer forums [12].

Christian Grunenberg laid the blame for the problems at Apple’s feet:

Apple supports only a subset of the PDF specification, and that support has always been buggy. For instance, PDF documents containing Eastern European characters created by the older ABBYY FineReader 8 engine are corrupted by PDFKit after editing. And issues reported by Peter Steinberger (author of the PDF framework PSPDFKit [13]) were simply closed with the response that Apple didn’t intend to fix them.

Apps that don’t use PDFKit are immune from these problems, of course, but only to the extent that their PDFs aren’t shared more widely and edited in Preview. Greg Scown of Smile told me that PDFpen [14] operates independently of PDFKit, but

bugs in Preview impact PDFpen customers whose document recipients use Preview rather than PDFpen to view or edit them. We have not had reports of PDFpen causing data loss of documents’ OCR layers.

Interestingly, Preview itself may suffer less from bugs in PDFKit than third-party apps. Michael Tsai said that some of the bugs he has seen don’t manifest themselves in Preview, suggesting that Apple’s Preview team is aware of the problems and is choosing to work around them rather than getting them fixed in PDFKit itself.

It pains me to say this, speaking as the co-author of “Take Control of Preview [15],” but I have to recommend that Sierra users avoid using Preview to edit PDF documents until Apple fixes these bugs. If editing a PDF in Preview is unavoidable, be sure to work only on a copy of the file and retain the original in case editing introduces corruption of any sort. Smile’s PDFpen [16] is the obvious alternative for PDF manipulation of all sorts (and for documentation, we have “Take Control of PDFpen 8 [17]” too), although Adobe’s Acrobat DC is also an option, albeit an expensive one.

In the meantime, we’ll be watching closely to see which of these PDF-related bugs Apple fixes in 10.12.3, which is currently in beta testing.

[1]: http://tidbits.com/article/16768
[2]: http://tidbits.com/article/16810
[3]: http://tidbits.com/article/16853
[4]: http://mindwrap.com/
[5]: http://www.devontechnologies.com/products/devonthink/overview.html
[6]: http://c-command.com/eaglefiler/
[7]: http://mjtsai.com/blog/2016/12/21/more-macos-preview-pdf-trouble/
[8]: http://www.documentsnap.com/
[9]: http://www.documentsnap.com/ocr-text-macos-sierra-preview/
[10]: https://www.abbyy.com/en-us/finereader/pro-for-mac/
[11]: http://www.sonnysoftware.com/
[12]: https://forums.developer.apple.com/thread/60440
[13]: https://pspdfkit.com/
[14]: https://smilesoftware.com/pdfpen
[15]: http://tid.bl.it/tco-preview-tidbits
[16]: https://smilesoftware.com/pdfpen
[17]: http://tid.bl.it/tco-pdfpen-8-tidbits