Microsoft Office for Mac 16.30
Microsoft has issued its monthly update of Office for Mac with version 16.30, enabling you to open a PDF with a text layer in Word and start editing it after converting it to a Word document (the original PDF won’t be changed). PowerPoint gains the Ink Replay feature, which enables you to replay ink strokes as they were drawn to help you illustrate step-by-step processes. Excel adds support for the Visio Data Visualizer add-in for creating Visio data visualization diagrams and also includes two security updates that patch remote code execution vulnerabilities. ($149.99 for a one-time purchase, $99.99/$69.99 annual subscription options, free update through Microsoft AutoUpdate, release notes, macOS 10.10+)
However, it only works if the PDF has selectable text, as MS Word does not perform OCR on images contained in PDF files. Thus, this new feature is useless to edit scanned documents. A workaround available for Windows users - using OneNote to do the OCR - is not available for Mac users, as the OCR engine is cloud-based and there is a huge queue (days) to process images in images inserted in OneNote files.
A good point, thanks! Many apps that scan to PDF can do OCR as well, so it shouldn’t be a problem with all scanned documents.
As Adam mentioned, there are many ways to create searchable text by OCR in scanned documents; e.g., the scanning/OCR software available with Evernote for Mac. However, there is of course a huge difference between searchable and editable, and that brings up the very intention of the nature of the PDF document format. I’m pretty certain it was conceived as a way of distributing printable documents that would look the same on screen or in print, and that capability was one of the things that made the PostScript Laser Printer such a success in the 1980s. If one creates (or copies) a PDF by scanning a paper document, the creator of the document may be very unhappy to learn that it’s been altered. PDFs as created may be locked for editing and password protected for access—at least when transmitted electronically—I have no idea how one can block editing access to a piece of paper once one has it in one’s possession, at least if one has the right software tools). (For example, look at the piece from last Sunday’s 60 Minutes broadcast, discussing the recent discovery that many of the printed copies of Columbus’s letters to King Ferdinand in famous museums, including the Vatican’s, describing his first voyage to the New World are actually very recent forgeries, and that one of the stolen documents was discovered in the US Library of Congress.
Sorry to venture so far afield, but in some ways the PDF file format is a real mess, and many people, I think, place far too much credence on the notion that if it’s a PDF, it’s the same as was intended by its author, and that’s under intentional and unwitting attack, as well as being limited by the tools (such as what character set is embedded in the original) used to create it. There’s a reason that full Adobe Acrobat DC is so expensive (OK, much of it possibly is greed), but it does include features that attempt to make certain that it’s not only portable, but also permanent.
I don’t think there’s a way to prevent OCR of a pdf that is locked for editing but obtained elecronically, then printed. And, I suspect, there’s no way then to prevent altering the OCR layer to change the formatting or even the meaning of the scanned document.
Finally, since I don’t use OneNote (I’m still in the Evernote camp), but I’m uncertain what you meant to say when you wrote “there’s a huge queue (days) to process images in images…”
Are you saying that the obstacle for Mac users is that they must wait a long time, or that the capability just doesn’t exist in Office 365 Mac.
My sense is that Microsoft spends a lot of time trying to sell Mac users on the untrue notion that the Windows and Mac versions of Office are now feature-set equals. All one needs to do to realize that’s not the case is to open a create an Excel worksheet in Windows, enter some data, then turn to the tooltip help for suggestions on how to manipulate or format it. In Windows, once one finds a tooltip by hovering over an icon in the toolbar, opening that tooltip often reveals detailed instructions on how to accomplish what the tooltip describes. On the Mac, when one opens a tooltip (lets say the tooltip says “BigHelp.” When you open it, you’re likely to find nothing but a reprise of the word “BigHelp.”
Interesting point and your characterisation of myth of feature-equality in Office apps is spot on and this is one example.
Just to clarify the OneNote queue. If you insert an image or PDF containing text, the app sends that image to Microsoft servers to perform the OCR process. Initially, if you right (ctrl) click the image, there is a single “Copy” option. After the OCR is done, there is another option “Copy as text”. Then, you can export the note as a PDF and Word (16.30) can convert the PDF into a Word file.
I have used this feature a couple of time. The first image, a one page with just text, the process took just over 24 hours. The second time, a larger file (13 pages with some charts and logos) took over 3 days.
According to a Microsoft representative post in the support pages back in 2016 , OneNote in a Windows environment perform the OCR locally, using capabilities within the operating system. He suggests Mac users should ask for this feature to be enable locally as well.
Anyway, I gladly use PDFpen Pro, which provide a more seamless OCR and export to Word experience.
This is 100% true, and it was also developed to ensure documents looked the same on different platforms, including DOS. And uneditable PDFs displayed the same whether or not the fonts used in the document were present on the computer. PDFs would also render photos and line art properly on pages with text. PDFs were truly an earth shaking, game changing development in printing and publishing, almost equal to the development of Mac OS. It’s one of the few technologies invented in the late 80s/early 90s that is still an industry standard today.
You can easily password protect and watermark PDFs in Acrobat, and there is even a way to add a watermark that will appear only when documents are printed out. In InDesign, you can add a watermark to a master page; I don’t know if this can be done in PDFs created by Word or whatever. Of course you can’t stop people from retyping copy or faking an image, but watermarks can discourage copying.
I agree that Acrobat is very expensive, and I’m not very happy that it’s only availability in Adobe’s extremely expensive Creative Cloud subscription bundle. But I do give credit to them for allowing any developer to build PDF creation and editing capabilities without having to pay royalties or a fee. Yes, creating PDFs in Apple, Microsoft, and millions of other apps did, and does, help sell Acrobat Pro, but it also continues to make so many more apps available, marketable and useful, and has given just about anyone with a computer or smartphone a way to read or make PDF files
Try opening a spreadsheet created in Pages in Excel and you will truly find yourself in the ninth circle of hell.
Unfortunately, that’s not always been true. For example, I’ve downloaded an official PDF article reprint from the New England Journal of Medicine to my Mac, opened it in Adobe Reader for the Mac, and found the Greek “beta” (upper case) character replaced by the mathematical symbol for an integral. A very knowledgeable Mac font maven based in Belgium was able to explain it to me, but that was years ago and I don’t remember the details.
I have no doubt that’s true, even though I suspect you’re intention was to say that a spreadsheet created in Mac “Numbers” likely would be opened and used in Excel (either Excel Mac or Excel Windows) would be optimistic in the extreme, and I suspect the opposite direction (create in Excel, open in Numbers) wouldn’t do much better for complex workbooks, but in those comparisons the native file formats aren’t the same. I’m contending that Mac Excel remains a poor stepchild of Windows Excel, despite MS’s contention that the data file formats are basically the same.
Sounds like whoever generated the PDF didn’t embed the fonts.
Join the discussion in the TidBITS Discourse forum