I was a bookworm when I was a kid, reading almost a book a day. I lived overseas, and English books were hard to come by, so I devoured any I could find. Though this was long before I knew anything about computers, I remember having a fantasy of somehow having every book in existence available to me via some device.
Today that dream is approaching reality. We have the tech, but we’re bogged down by intellectual property rights. Millions of books are not available in digital form and might never be, if lawyers have their way. I find this frustrating, as I vastly prefer reading my books digitally. Digital books are lighter and easier to read, have flat pages instead of curved, can have consistent lighting, are adjustable, can be marked up and highlighted without interfering with readability, and can have extensive notes added. Plus, I can carry thousands of them on my iPad.
In fact, I prefer digital books so much that, a few years ago, I took the radical step of donating 90 percent of my printed books to local libraries. I freed up ten 72-inch bookcases — basically a whole room of new space! I realized I hadn’t read any of my printed books in five years, since the iPad appeared. I knew I wouldn’t, either. When I wanted to reread a book I owned (such as “The Hobbit,” prior to the first movie coming out), I just rebought it in digital form.
I saved a few of my favorite printed books, and I still have roughly a thousand books left. But my long-term goal was to figure out how to digitize them.
First Attempts — The first thing I tried a few years ago was scanning books with a flatbed scanner. This actually worked well, especially with some software I wrote to split the two-page scans apart. I brought all the images into Adobe Acrobat Pro which merged them into a single PDF and did OCR on the text so everything was searchable.
The problem was that the scanning process was slow — about 30 seconds per page, plus time for me to flip to the next page, position the book correctly, and start the next scan. It typically took me close to 2 hours to do a book, and it was tedious and left me physically sore from staying in the same position for so long. And that was before all the additional processing I had to do.
Searching the Internet for other solutions, I learned that book scanners were insanely expensive. At the high end were professional machines used by libraries that cost $100,000 or more. At the low end were hacked-together devices of questionable reliability that would still cost me thousands of dollars.
The optimal solution may have been a simple sheetfed scanner that apparently does a great job — if you’re willing to chop the spine off your book to turn it into individual sheets, which I was not.
Later, I experimented with my own hacks, trying to rig up a way to use a digital camera to take snapshots of books.
The problem wasn’t the camera, however. It was the books themselves. Perfect-bound books aren’t designed to stay flat, and the nature of them is that as you flip through the book, the numbers of pages on the right and left sides change, resulting in an uneven height. Without some way to press the pages flat, you end up with all sorts of distortion from the curved and uneven pages.
I gave up.
Enter CZUR — Recently, I was on Amazon and something I saw reminded me of my abandoned search for an affordable, serviceable book scanner. Not expecting much, I typed in “book scanner” and, to my surprise, a few are now available, and the price has dropped radically.
At the low-end are overhead scanners for under $100. These don’t include much in the way of software or support — most aren’t Mac-compatible — and they aren’t much better than rigging up your own camera. Your camera is probably far higher quality, too.
Then I ran across the CZUR M3000 Professional Book Scanner, which was apparently an Indiegogo project and now is a shipping product. The CZUR isn’t cheap at $399, but it’s inexpensive compared with professional solutions. My philosophy is that if I can scan 400 books with it, I’m basically paying a dollar per book, which is far cheaper than I could rebuy them for, even if they were all available digitally. Even scanning just 100 books would probably pay for the scanner, though, of course, that doesn’t include my time.
The CZUR is made by a Chinese startup company and is a weird mix of high- and low-end. For instance, the packaging is so elaborate that it’s overkill — it makes Apple’s packaging seem pedestrian. Even the USB cord is packed in a tiny labeled box which fits into a slot in another box. But other parts seem primitive. Its software is quirky, to say the least. But the good news? It works.
Hardware-wise, the CZUR is a simple overhead scanner. It has built-in LED lights to illuminate the subject, along with a high-quality 16-megapixel Sony camera. (It includes some extra LED lights powered by built-in rechargeable batteries, but I haven’t been able to figure out why they’re useful.) There’s also a black rubber scanning mat that provides necessary contrast, and two sets of weird-looking yellow thumb covers, which you need to use while holding the book pages open so the CZUR Scanner app can identify and erase the thumbs.
Most importantly, the box includes two cabled buttons: one for your hand and one that’s a foot pedal (you can connect only one at a time, though). The latter is awesome for scanning pages quickly while your hands hold the book open.
Hardware you can’t see includes a microphone (the scanner listens to sounds that the mobile app makes to transmit your Wi-Fi password during setup) and lasers that the CZUR uses to detect the curvature of each page of your book.
The CZUR supports Wi-Fi for wireless scanning to the CZUR cloud (you get a 10 GB account for free), but the setup is so convoluted and the software so limited that you’ll only want to connect it via USB to a computer.
One example of the cloud mess: the CZUR Cloud iPhone app wouldn’t let me set up a new cloud account because one of the fields was “verification” and I didn’t know what value to put there. I finally went to the Web site to set up the account there, where I was shown a CAPTCHA to enter — that turned out to be the “verification” — it was never displayed in the mobile app!
Another problem with CZUR Cloud is that there’s no easy way to delete a scan. You can log into your cloud account on the CZUR Web site and delete scans from there, except that there’s a terrible bug: it deletes the thumbnails you don’t have selected instead of the other way around. After that bad experience, I decided to forgo all the cloud features.
Scanning a Book — I won’t bore you with my trials and errors; let’s just say the CZUR Scanner software requires some adjustment. The documentation is barely in English, and sometimes it makes no sense at all. Here’s a sample:
Nothing except scanning material should be in the “the preview area”, or will interference algorithm and affect the scanning effect. Such as “hand” USB button ’line’ ’mobile phone’ and ’pen’ etc.
As you can see, this is understandable, but it has taken a simple instruction (“put only items to be scanned in the scanning area”) and made it nearly inscrutable. I suppose it’s good that there are so few instructions!
CZUR Scanner’s interface is non-standard. Dialogs all use Confirm buttons instead of OK, and they’re reversed so Cancel is on the right, and there’s no default so you can’t just press Return. The app doesn’t even have a Quit menu command! To quit the software, you click the main window’s close button.
I initially couldn’t get CZUR Scanner to work at all: I got a black window in scan mode. Working from a hint in an Amazon review that said the software worked in Sierra, I upgraded my Mac from OS X 10.11 El Capitan to macOS 10.12 Sierra, after which CZUR Scanner started working. After some initial experiments that gave mediocre results, I finally tried scanning a book for real… and everything worked surprisingly well.
I have since scanned several books, and the process is straightforward. With the scanner turned on, you launch CZUR Scanner and enter scan mode. There, you configure the kind of scan (color, grayscale, or black-and-white) and post-processing needed (a flat item, a book spread, manual setup, or none).
The post-processing is where most of the magic happens, especially for books, as that’s where CZUR Scanner splits facing-page scans into individual pages, flattens curved pages, and so on. Supposedly, you can change the post-processing mode later, but I found things don’t work right — for instance, spreads are not split into separate pages — unless you choose the correct mode prior to scanning.
I originally assumed that black-and-white was best for books since the image files would be smaller, but later experiments showed me that grayscale is better and produces smaller files. Don’t ask me why — I assume it’s a bug.
With CZUR Scanner configured, you go to the scanner and put the fingertip covers on your thumbs. Open the book to the first spread, hold it open with your thumbs, and press the foot pedal to snap a picture. CZUR claims the process takes less than 1.5 seconds. The actual picture taking might happen that quickly, but the additional step of the red laser beams flashing onto the book to detect the curves of the pages takes more time. But it is fast — just a few seconds per spread.
Once the laser beams disappear, turn the page, hold the book open with a thumb on either side, and press the foot pedal again. Repeat that until you get to the end of the book.
The process may sound tedious, but since it’s so quick, you don’t have time to get bored. When I did my flatbed scans, the process was so slow that I often got distracted playing a game on my iPhone or reading on my iPad.
One disadvantage of scanning a whole book at once is that you don’t know until you’re finished if you’re doing anything wrong. (It reminds me of the old days of film cameras, when you’d experiment with your camera’s settings and days later, when the film was developed, you couldn’t remember what you’d done to achieve a particular effect.)
You can, however, stop or pause during the scanning of a book. Since the CZUR Scanner app is performing most of the adjustments, it doesn’t matter if your book is in exactly the same position as before, so resuming where you left off is easy. When you’re getting started, if a scan feels like it might not have been good, stop and check the images before you continue. That way you can learn what does and doesn’t work.
In theory, scanning a whole book could take less than 15 minutes, depending on its length, but my first tries were slower. Scanning an old 150-page paperback took me nearly 30 minutes, with a few problems. Later I scanned a modern hardcover book that went smoothly — at 356 pages it took me 25 minutes. Later efforts have come closer to that 15-minute mark, but 20 minutes per book seems about the norm, plus a few minutes if you need to redo some pages.
Once you’re finished scanning, you can close the scan window on your Mac. Note that you don’t click the standard macOS window close button, but a special tiny “x” that appears under the title bar.
This takes you back to CZUR Scanner’s main window where you can look at your scans, which show up as a list of numbered image files on the right.
CZUR Scanner has a batch mode that lets you apply global fixes to groups of images, which is handy. You can rotate, crop, change contrast, adjust color, delete images, and even replace pages. If you replace a page, CZUR Scanner takes you back to scan mode and, once you’ve rescanned the page satisfactorily, it intelligently replaces the page you indicated, even if it was just one page of a two-page spread.
Unfortunately, I found that page scans tend to vary a lot in size and positioning, which makes batch cropping useless. For instance, some pages are slightly rotated while others have different margins. If the scans were more uniform, you could, for instance, crop all the images to cut out things like headers and footers that you may not want in the OCRed text.
A video on the CZUR Web site shows an additional “manual adjustment” screen that, as far as I can tell, is not available in the Mac version of CZUR Scanner. Some of those tweaks look helpful, such as adjusting the shape of the scan and manually setting the divide between pages. Hopefully, those enhancements will appear in an update soon.
Ideally, however, you won’t need to do any editing. My first book required some rescans, as did my second, but my third was fine on the first try. (Proof that practice helps! A later book did have more issues, but I was going faster and the book was an older copy that didn’t cooperate as well.)
What sort of problems might you run into while scanning?
- While the flattening of curved pages works surprisingly well, it isn’t perfect. The same goes for the thumb removal and rotation of pages. I’ve noticed inconsistencies in how these features work. In every book I’ve done, some pages are slightly crooked or poorly cropped, sometimes a thumb isn’t completely removed, occasionally there’s visual warping, and often extra side material is included. In general, these are minor visual oddities and don’t interfere with the OCR or the readability of the book, but they do mean the scans aren’t perfect. For my purposes, this is fine, but if you want flawless scans, the CZUR isn’t going to cut it.
- Some books have small margins, so your thumbs appear in the scan. CZUR Scanner seems to erase a bit too much when removing your thumbs, so even if your thumb is a few millimeters from the text, some of the text might be obscured.
Some books are very springy and don’t want to stay open. Combined with small margins where you have little room to grip the pages, this can make the scanning process take longer as you fight with the pages. This is usually a problem only at the beginning and end of books, and I find flipping through a book to crack the spine before scanning helps, especially with paperbacks.
You have to be careful not to force a page flat with the pressure of your thumb. Instead, let the page curve naturally, and let CZUR Scanner flatten it. When you force a page, it crinkles and distorts, and the scan ends up quite warped.
Be sure to use the included thumb covers. Though they feel weird and make the scanning process awkward, CZUR Scanner uses them to know where the edges of the book are. Without them, it may include your hands and arms in the scan, and you’ll have to crop those pages manually.
In facing-page mode, CZUR Scanner does not like blank pages. If one side of the page is blank, it won’t crop and rotate the other page correctly — it’ll be twisted at a weird angle with text cropped off. The simplest solution that I found is just to put a mark on the blank page. This fools CZUR Scanner into thinking there’s something there, after which it works correctly. If you don’t want to mark up the book, you could put a marked Post-It on the blank page instead.
Because of the built-in overhead LED lights, glossy items have a lot of glare. I had read about this, thinking it didn’t apply to me, but the covers of even my 1960s paperbacks have a surprising amount of reflectivity. I initially considered just scanning the covers with my flatbed, or taking a photo with my iPhone, but then I realized that I could just move the book away from the center of the scanning area where the lights were focused. (For covers, I do a manual scan in the scan settings.) I had no problems inside the books, but be aware that scanning glossy paper is a big problem. I don’t think the CZUR would work well for magazines, for instance. I did scan one glossy book and while the results were readable, it had washed-out spots and many of the photos were muddy.
When you’re done editing, click the checkbox button to select all the images on the sidebar and then go to the Export tab and choose Searchable PDF. CZUR Scanner will perform OCR as it exports the scans as a single file. It may take 5 or 10 minutes, depending on the speed of your Mac, but you don’t have to stand there and watch it.
At this point, you’re essentially done! The resulting PDF may be quite large — my 356-page book is 86 MB — because it includes both image and text layers. (That was in black-and-white mode, which seems to be larger than grayscale. I later did a 162-page book in grayscale that was only 22 MB.)
It shouldn’t be a surprise that creating a PDF means that you’re stuck with the original layout and formatting. You may be able to zoom in, depending on your reading device, but you can’t reflow text or change anything.
Since there is a text layer, you can search the book, which is awesome — though OCR errors and scanning quirks may foil some searches. For instance, I searched for a phrase I knew was in the book, but Preview said there was no such text. It turned out the word was hyphenated at the end of a line and my search didn’t include a hyphen.
Working with OCR Text — I did some testing on the raw text and was fairly impressed. CZUR Scanner uses the ABBYY FineReader OCR engine, which is better than Adobe’s. Since CZUR Scanner produces standard image files, you’re free to do the OCR with another program if you want. Either way, expect lots of errors. It took me several hours to clean up a whole book, even though I used regular expression pattern matching to find common mistakes.
Part of the problem is that the text includes things like headers and footers (which show up in the middle of other text), CZUR Scanner can’t always figure out when paragraphs end (so paragraphs merge into one), and it doesn’t handle hyphenated words at the end of lines very well (they show up with spaces instead of a hyphen, so “hatch-ing” becomes “hatch ing”).
Books are also all over the place in terms of formatting and the way chapter numbers and titles appear, so you’ll have to fix those by hand. One book I scanned put fancy graphics behind the chapter titles, so each title came in with a few garbage characters.
I’ve written a little find/replace script that fixes about 75 percent of typical errors, and in one easy book — a hardback with generous margins and good typography — I addressed the rest of the major errors and formatting in about 10 minutes.
Note that this is “personal use” quality and is far from flawless. To get it perfect would require reading every line and running it through a spelling checker. You’d also have to manually apply any special text formatting, such as bold or italics. I may have to tweak my script for different books, but so far it’s showing real promise for my needs.
Also note that I’m primarily focusing on scanning fiction, which has minimal formatting. When you scan pages with poems, equations, figures, tables, or multiple columns, expect the result to have more problems. I scanned some spreads from a couple of cookbooks, just to see how well CZUR Scanner works with fractions and more complicated formatting. These are raw scans, completely unedited, and I extracted the text so you can see how well the OCR performed.
The advantage of having the OCR text to work with is that I can convert it to PDF myself, choosing my own font and text size and layout. The result is far more readable than a photograph of a printed page. And that 86 MB book is now just 1.2 MB!
Another huge benefit of plain text is you can convert it to other formats, such as EPUB and Mobi. You’ll want to do that if you prefer reading on an iPhone (EPUB) or Kindle (Mobi), as they don’t usually well work with PDFs.
While raw text is great, creating it does require more effort, and if the OCR has trouble with a run of text, you don’t have the original scan layer in the PDF to refer back to.
Is CZUR Worth It? — Only you can answer that question. Since nearly all books are copyrighted, you can’t use the CZUR to digitize books for resale, so you can’t make money with it.
I believe digitizing books that I own for my personal use is entirely legitimate, but technically speaking, I probably shouldn’t sell the physical copies. Even giving them away is a gray area if I retain the scans, though I’m comfortable with that.
Since I’m a writer and publisher by profession, I respect copyrights, so I don’t condone the distribution of scanned books (unless their copyrights have expired, but that’s an awfully long time these days).
So we’re left with just personal use. Is it worth spending even an hour digitizing a book you may never read again? How you answer that question depends on the book, what sort of a reader you are, and how much time you have. The CZUR is easy enough to use you could scan a few books in the evening while you watch TV.
That’s what I’ll do, and I will probably bother to convert the raw text on only a few of my very favorite books, or books with horribly small printing that I want to make more readable. Most books I’ll just leave as images.
But remember, there are other benefits to a digital library: being able to search a book’s text is amazing, and it could be vital if you’re a writer or researcher. Being able to carry your home library with you while traveling could be wonderful. Having digital access to books that aren’t available digitally is also a huge win if you’re a fan of digital reading. Many businesses could also use the CZUR for digitizing old catalogs, legal proceedings, and similar tomes that have some historical value but take up a lot of space.
If you don’t already have a flatbed scanner, you can use the CZUR for other kinds of scanning, and it’s especially useful for oversized or three-dimensional items. It’s also much faster than a flatbed, though you’d need to be doing a lot of scanning for that to be a significant factor. Unfortunately, I don’t think those other uses justify the CZUR’s price. Its real purpose is scanning books, and if that doesn’t appeal to you, you aren’t in its target market.
The CZUR scanner certainly isn’t for everyone, but if you have a lot of printed books that you’d love to have in digital form, and you’re willing to put up with a learning curve and some fiddly software, it does work well.