Matt Neuburg 8 March 2004

DEVONthink Thinks, So You Don’t Have To

In case you’ve forgotten what a snippet keeper is or why you might need one, here’s a case in point. Last week, a note appeared on TidBITS Talk, containing three URLs pointing to Web pages with information I found especially valuable. (It was an explanation of how the precise DOCTYPE specification in your HTML affects whether a browser displays that page in a standards-compliant manner.) Instantly, I wanted to save this information; it was too technical to remember, but I could easily picture myself wanting it for reference later.

<https://tidbits.com/getbits.acgi?tlkmsg=20422>

<https://tidbits.com/getbits.acgi?tbser=1196>

Unfortunately I could also easily picture myself having no idea where I put this information, what form it was in, what I had called it, or even what precisely it was about. So how was I going to store it so as to be able to find it again? I could save the Web pages as URLs, HTML, PDFs, or Web archives, and keep them on my hard disk. But, you know, I can never really find documents on my hard disk when I need them. Folder and file names alone never lead me to the desired information – especially when I can’t remember what folders I have or how I arranged them in the first place. Another problem is that even if I stumble across the right document, I don’t necessarily realize this, because I can’t see inside it unless I open it. But it’s a big pain to open lots of documents or URLs while slogging through my hard disk, and besides, I can have a document open in front of me and still not realize it’s the right one!

From this example, four lessons emerge.

A hierarchy is good, because it groups related things; but it’s not enough, because you can’t anticipate what circuitous path of association your brain will be using later when you’re hunting for something. There needs to be some other way to locate the desired article based on whatever sense of its subject matter occurs to you at the time.

The storage needs to accept any kind of entity, like the Finder. It can’t be confined to a single type of entity because the information might not come in that form.

One must be able to see a document’s contents directly, without bothering to open it separately. Internet integration would be nice too, since (as in this case) information often comes in the form of Web pages.

The storage needs to be central – a single, certain place where you go any time your mind says, "I think we’ve got something about that somewhere…"

Enter DEVONthink, a program that understands the problem and proposes itself as the solution.

<http://www.devon-technologies.com/products/ devonthink.php>

The View from Here — DEVONthink’s interface is extremely clean and intuitive, and calls for very little comment or explanation. The window displays a database, which is initially empty. To this database, you add entries – you can think of these entries as "documents," and originally for the most part they really are documents, which you’ll probably just drag in from the Finder. You can also create "groups," which look and behave like folders. So your database is a hierarchy, which you can arrange freely, just as in the Finder. You can clone a document, so that more than one entry appears for it; thus, the same document can be part of more than one group.

Viewing the overall structure of your database is much like looking at the Finder; the interface includes a list view, an icon view, and a column view. But you can also view the contents of an individual document directly within DEVONthink; a two-pane view lists your overall database in one pane and the contents of the currently selected document in the other, or you can double-click a document’s listing to display its contents in a separate window. If a document is HTML or a URL, DEVONthink displays it as a Web browser would. If a document is plain text or RTF, you can not only view but also edit it within DEVONthink.

Ways of Finding — DEVONthink knows you’re going to want to find a document by way of its subject matter, and its solution is to word-index your data. So, on the one hand, you’ve arranged your documents within a hierarchy of groups, but at the same time, at the level of individual words, DEVONthink cuts across this hierarchy to facilitate searching.

Thus, you can search by a word or words. Multiple words can be combined by AND or OR; you can search on a phrase; you can search in the contents of documents or in their titles, or even in a comment field. You can search globally or in one group. Matches can be exact or "fuzzy." Results appear instantly when you hit the Search button, and are ranked by a relevancy score.

If your initial word search doesn’t prove helpful, you can generate a list of words similar to a search term, based on spelling. DEVONthink knows all the words in all your documents, so this list is generated based on that knowledge. For instance, in my database, "program" led to "programs", "programmer", "programming", "programmed", and "programmers" – basically, it got the right answer. You can then combine these new terms as desired to form a new search.

You can also generate a list of similar words based on context. This apparently comprises words used many times in documents where your original word was used many times, and the results can be really bizarre. For example, starting with "program", my first context-similar word was "clrc", because this (an abbreviation for California Law Revision Commission) happens to occur 28 times in a document where "program" occurs 24 times. In fact, all my words contextually similar to "program" were from this one document; removing it from the database resulted in a much greater (but still bizarre) diversity. The algorithm behind this feature could use some tweaking, I think (though I’m told it gets better as the database gets larger).

You can also do some powerful searches starting with an individual document. First, you can get a list of all the words in that document; you can sort this list by frequency, length, or "weight" (apparently an expression of combined length and rarity), and, of course, you can search instantly on any of those words. However, if your intention is to find documents related to this one, you are more likely to consult the list of this document’s "keywords"; these are the highest-weighted words for this document that are also found in other documents, and again you can instantly search on one of them.

Alternatively, you can ask for a list of documents that DEVONthink itself considers most similar to the current document. I don’t know how DEVONthink draws its conclusions in this matter, and the results are often surprising, but they do typically include at least some documents that are genuinely related.

By the same token, you can ask DEVONthink to "classify" a document: that is, to list the groups whose documents it considers most similar to this document. If you really trust DEVONthink’s ranking here, you can even "auto-classify" a document, causing it to be moved directly into the most similar group; in fact, a preference lets you tell DEVONthink to do this automatically upon import of a document. The manual advises that comparison and classification are improved if you spend some time early on arranging documents into meaningful groups.

What Goes In — For DEVONthink to search on a document’s contents, it must be in a format from which words can be extracted. Such formats include plain text, RTF, HTML, PDF (which DEVONthink parses using pdftotext, or TextLightning if you have it), and even Microsoft Word files (now that Panther natively converts these to RTF).

<http://www.metaobject.com/ Products.html#TextLightning>

But you can also use DEVONthink to work with a document that’s not in one of these formats. Any file can be handed to DEVONthink, which, if it can’t parse the document as text, simply maintains a link to the original on disk. DEVONthink can display images and movies, and play MP3s; but even if it can’t display a document’s contents directly, it can reveal or open the original in the Finder.

Why would you want your database to include links to documents that DEVONthink can’t index or display? Well, for one thing, you might want to take advantage of DEVONthink’s hierarchical file groupings; for example, if you have some text files and an Excel spreadsheet that somehow relates to them, you might want to be reminded about the spreadsheet when you’re looking at the group where the text files live. But also, when a document is just a link to a file on disk, you are free to create text for that document’s entry within DEVONthink; that text, which might describe the contents of the real document, is indexed and can be searched on.

What It Goes Into — DEVONthink uses just one database. This is a pity; I much prefer an architecture with different databases for different purposes, rather than having to lump together completely disparate material that I would never need to search simultaneously.

Another thing I don’t like about the database is that it does not consist solely of a word-index: if DEVONthink can index a document, it imports the whole document. There are two problems with this: size and security. A DEVONthink database, at least in my tests, proves to be about twice the size of the text files that constitute it. This means that if I don’t jettison the originals after importing them into DEVONthink, I’m using three times the disk space. But if I do jettison the originals, my data exists only in a proprietary binary format from which it cannot be recovered if DEVONthink some day goes on the fritz.

DEVONthink does let you export an imported document, and this seems to work (for example, file type and creator, as well as modification and creation dates, are maintained); so for extra security you could periodically export the whole database, thus regenerating Finder copies of the original documents. Nonetheless, I find the single-database architecture combined with the large database size and its proprietary format to be a significant deterrent to the use of DEVONthink; perhaps we’ll see a future version that will address these issues.

Shortcomings — On the whole, DEVONthink seems extremely well written; I have not seen it crash or otherwise seriously misbehave in such a way as to undermine confidence. Nevertheless, during testing I rapidly encountered a number of limitations that seemed to me unnecessary and easily fixed.

In a multi-word search, complex boolean expressions are not possible: either all the words are related by AND or they are all related by OR; similarly, you can search by content or by title but not both at once. Most database views are hierarchical, but there are no hierarchical navigation shortcuts; for instance there’s no command to move the selection hierarchically upwards. DEVONthink acts as a Web browser, but there are no buttons or shortcuts for Back and Forward (there are contextual menu items, but that’s a non-standard, inconvenient approach). An image document can be displayed within DEVONthink or can have editable text, but not both. There is no convenient way to launch URLs in plain text documents. When auto-classifying, there is no way to learn where DEVONthink put the document (it just vanishes and you don’t know where it went). There is no way to locate all the clones of a document.

DEVONthink is also riddled, quite unnecessarily, with jargon. Menu item commands are sometimes incomprehensible, and you have to resort to the manual to learn what they mean. What do you make of "Delete" vs. "Destroy"? What do you suppose "Touch" does? How does a document’s "Path" differ from its "URL"? (Hint: it has to do with the difference between "Opening" a document and "Launching" it.) Sometimes terminology is downright incorrect: "Toggle Outline" doesn’t change anything about outlining (it shows or hides a checkbox); "Replicate" and "Replicant" are used instead of "Clone" or "Alias"; "Concordance" doesn’t display a concordance (it displays a word list, which is a very different thing – a concordance involves context).

<http://www.rjcw.freeserve.co.uk/ss1.gif>

The manual is a PDF without bookmarks; the online help is exactly the same content in an almost useless format (a main table of contents page and a mass of subpages containing no links whatever). It is, in places, inaccurate, outdated, incomplete, and often not quite English.

Conclusions — DEVONthink is a program I’d love to love. I don’t, quite; the database architecture vexes me, and the shortcomings listed above, while in many ways minor, are the sort of oversights that surprise me in a program that’s a couple of years old and is now at version 1.8. Still, there’s no doubt DEVONthink is on the right track. And I’m told that there are already plans to address most of these issues in future versions of the program – some as soon as 1.8.1, which could emerge any day now.

Perhaps you remember my review of Boswell, and my complaints about it: it stores text only, it doesn’t store aliases, you can’t delete or edit a stored snippet, the interface is clumsy, it’s too expensive. DEVONthink answers all of those objections and more: it’s what I wanted to see in Boswell. DEVONthink is inexpensive, flexible, easy, intuitive; it features straightforward arranging and fast, powerful searching; it lets you edit snippets; it stores links to files on disk. In the interests of space, I haven’t done justice to all DEVONthink’s capabilities, so for the full story, you’ll just have to download it and see for yourself.

<https://tidbits.com/getbits.acgi?tbart=06441>

DEVONthink requires Mac OS X 10.2 Jaguar or higher and costs $35. A demo is available as a 3 MB download.

<http://www.devon-technologies.com/download.php>

PayBITS: Did Matt’s review shine the light of searchable clarity

into the murky corners of your hard disk? Send him a few bucks!

<http://www.paypal.com/xclick/ business=matt%40tidbits.com>

Read more about PayBITS: <http://www.tidbits.com/paybits/>

Share

Subscribe today so you don’t miss any TidBITS articles!