Series: Greps of Wrath
Text Machine brings search- and-replace pattern matching to scriptable applications
Article 1 of 1 in series
by Matt Neuburg
They walk among us - the greppers. One might be sitting next to you at this very moment. In fact, you might be one yourself. Yes, you! You may never have grepped before; you may not even know what grepping is; yet chances are good that within you too, inchoate and amorphous, has stirred a secret need to grep. Now that I have your attention, what on earth am I talking about? The fact that GREP originates as a Unix acronym for "global regular expression and print" need neither detain nor deter usShow full article
They walk among us - the greppers. One might be sitting next to you at this very moment. In fact, you might be one yourself. Yes, you! You may never have grepped before; you may not even know what grepping is; yet chances are good that within you too, inchoate and amorphous, has stirred a secret need to grep.
Now that I have your attention, what on earth am I talking about? The fact that GREP originates as a Unix acronym for "global regular expression and print" need neither detain nor deter us. Let's just let "grep" mean a certain kind of powerful text search or search-and-replace without which life as we know it would be impossible, or at least meaningless. Lots of everyday text-manipulation tasks turn out to be a snap with grep.
In the past, I have routinely amazed friends (and astounded enemies) with feats of greppy legerdemain, solving seemingly baffling text processing problems. For instance, someone once approached me with a text file that he wanted to import into an existing database. The file consisted of thousands of tab-delimited lines:
John [tab] Doe [tab] 473 [tab] yes Dick [tab] Smith [tab] 2471 [tab] no Jane [tab] Brown [tab] 587 [tab] yes
The problem? The database expected the last name first, first name second, and could not easily switch them. The names had to be switched before the import took place. My friend was stunned as I opened his file with Nisus Writer, asked it to find all instances of:
and replace them with:
and calmly returned his file to him. This friend was a perfect example of someone who needed to grep, but didn't know anything about it. Why not?
Perhaps it's because the only way to grep is to own a word processor (or text processor) with a built-in grep facility, and then to learn to construct expressions which, unless you're a computer, are opaque to the point of illegibility. And as if that weren't bad enough, different programs implement grep in different ways (my Nisus Writer example above would fail in BBEdit). But PreFab Software is changing all that, with the release of Text Machine 1.0.
The Text Machine concept is that of a universal grep utility. You learn just one grep, Text Machine's; then you call upon Text Machine from another application. And Text Machine wants you to be able to grep successfully, so not only does it provide a powerful variety of grep, but also the commands that you give it are English-like, and much simpler to work with than grep's normal backslashy mess. Text Machine is scriptable - that's how other applications talk to it. So you can command it remotely, and you can share grepping capabilities with less technically inclined colleagues who also own Text Machine, just by giving them a script.
It's a wonderful idea, but the best is yet to come. Ideally, what you'd like is for Text Machine to put up a find-and-replace dialog, complete with pop-down menus so you barely have to learn any grep at all, and make changes directly in your application. And indeed, that seems to have been PreFab's original plan - Text Machine was to be an OpenDoc part. But then things were derailed by Apple's abandonment of OpenDoc, and the dialog interface has been postponed to version 1.1. I've seen an alpha of this, and had a blast doing dialog-based grep find-and-replace within Eudora. But right now, Text Machine lacks an interface; the only way to communicate with it is via scripting (typically, in AppleScript or Frontier's UserTalk). To many of us, that's a pleasure, not a problem; but if you don't feel up to it, perhaps you'll want to wait for 1.1.
Grep School -- Though I dearly love Nisus Writer's grep, I do admit that a certain portion of its satisfaction lies in the hocus-pocus factor. How can gibberish be so powerful? If I had used Text Machine when I helped my friend prepare his file for import, I might have lost my honorary warlock status, upon giving my friend an AppleScript script that does the same thing (when executed, for instance, in Apple's own Script Editor):
tell application "Text Machine" to replace in alias "HD:yourFile" all "[(textstart or paragraphdelimiter)1]" & "[(column)2, tab]" & "[(column)3, tab]" with "[group1, group3, tab, group2, tab]"
This is still in code, to be sure, but a more English-like code, quite comprehensible once you know a few facts. The things in parentheses with numbers after them are groups. The term "column" means the whole text within a column, everything between one tab or return and the next. So, the first group is whatever precedes a paragraph; the second group is everything in a paragraph up to the first tab; and the third group is everything between the first tab and the second tab. We then just swap the second and third groups.
Here's another example of Text Machine's grep. Yesterday (truly!), an acquaintance wanted to extract the title of any HTML document (that is, what's between the <title> tags). This would work in most cases:
tell application "Text Machine" to extract in alias "HD:my Web site:default.html" first "['<title>']" & "[(shortest oneOrMore char)1]" & "[htmlTag]" transform with "[group1]"
Single-quotes denote stretches of literal text. The addition "transform with" performs a replacement on the text returned to us, not in the original; so we end up with group 1, which is precisely everything between the <title> tag and the next tag (which we may assume is </title>).
A more generalized solution requires us to supply alternatives for every letter, because Text Machine is case-sensitive:
tell application "Text Machine" to extract in alias "HD:my Web site:default.html" first "['<']" & "[('t' or 'T'), ('i' or 'I'), ('t' or 'T')]" & "[('l' or 'L'), ('e' or 'E')]" & "['>']" & "[(shortest oneOrMore char)1" & "[htmlTag]" transform with "[group1]"
Luckily, this is due to change with version 1.1, when a case-insensitive matching option is added.
Greptitude Test -- PreFab Software has shown tremendous ingenuity in implementing grep patterns as English-like phrases rather than the traditional Unix-style codes. This feature is unique - Nisus Writer has something similar, but it's clumsy to use and has limited features. PreFab Software has also given Text Machine a stunningly well thought out repertoire.
Consider the range of entities on which Text Machine can operate. It can do search-and-replace on a literal string handed to it as one parameter of a command. But, as we have seen, it can also search and make replacements in files on disk.
Even more useful, Text Machine can open a document and leave it open. It doesn't display the document, for it has no windows (except in a special debugging mode). But it maintains the text of a file in memory, so any changes made through search-and-replace are not written out to disk unless you explicitly save via one of your script's commands. Also, an "insertion point" is maintained so that successive calls to a verb such as "match next" will cycle through the text. What's more, Text Machine can create a new document in memory; it can maintain multiple documents in memory; it can assign text to a document in memory. Thus one can work with large texts without passing them repeatedly to Text Machine; this reduces overhead and increases speed.
Then there is the ample syntax of Text Machine's four verbs. "Replace" alters the original text, returning the altered result or the count of replacements. "Extract" searches and returns the found text, optionally performing a replace on it; "extract all" returns a list or a delimited string. "Locate" and "match" report such things as the contents, position and length of the found text or texts, the results of replacing, and the contents of any groups; such information helps other scriptable applications to operate on what Text Machine has found.
Here's a nice touch. A common need is to perform a series of "replace all" commands to alter a document in some wholesale manner. For this, Text Machine provides a notational shortcut: the texts to search for and the texts to replace them with are concatenated into two lists; you give one "replace all" command, and Text Machine loops through both lists. An included utility script shows how to implement a further convenience: you create the search-and-replace pairs as a tab-delimited text file, then have Text Machine parse it to generate the lists.
Grep Tide -- Text Machine's grep is almost a superset of grep implementations in such programs as BBEdit, Nisus Writer, and Microsoft Word. For example, it lets you specify longest or shortest match, which BBEdit and Word do not; it lets you specify a quantity or range of quantities - for example, 60 to 80 successive non-returns - which is very difficult in BBEdit and Nisus.
On the other hand, it lacks some traditional grep constructs. You can't speak of a letter range, such as a letter alphabetically between "a" and "g"; you must specify the range's contents explicitly, like this: "[<abcdefg>]". (The justification might be that such specification is no hardship, and anyhow the most commonly needed sets are predefined, like "[lowercaseLetter] or [controlChar]".) The inability to do a case-insensitive match (in version 1.0) is inconvenient. Further, there are almost no positional keywords: thus, in my first example, I had to speak of the beginning of a line (paragraph) as "[textstart or paragraphdelimiter]" and then work around the fact that the matched text includes an extra character.
Still, these shortcomings can be worked around, and so all in all Text Machine's grep is as powerful as they come, along with being easy to express using English-like phraseology.
For example, I have a HyperCard stack that archives messages from mailing lists. I receive the mailing lists in digest form (many messages in one email message); the stack parses the digest, storing each message's date field, subject field, sender field, and content on a separate card. Originally I had a devil of a time coding this in HyperTalk; but HyperCard speaks AppleScript, so Text Machine's grep is available to HyperCard, at which point the same task becomes trivial (and much faster) to code. Microsoft Word, too, can benefit from calling Text Machine: how often, writing a WordBasic macro, I have wished Word's grep were more like Nisus Writer's! Now with Text Machine on hand, it can be.
Grep extensions to AppleScript (such as Late Night Software's "Regular Expressions") and UserTalk (the "regex" UCMD) do exist. But Text Machine is much easier to use than these and more powerful. Also, Text Machine benefits from being a true application, which can modify text files or memorize large texts for extensive interaction.
Get a Grep -- If you own Nisus Writer, feel comfortable with its grep, and you don't need to do any scripting beyond Nisus macros, you probably don't need Text Machine. Also, Nisus Writer's grep has a special feature: it works on styled text. Thus, there are some tasks for which Nisus is uniquely suited, and Text Machine isn't trying to compete.
On the other hand, if you use or are willing to use AppleScript or UserTalk, and if you have any program which is scriptable or which can execute an OSA script (HyperCard, Microsoft Word, FileMaker Pro, and so on), you can incorporate Text Machine's grepping functionality and perhaps find an answer to your text-processing prayers.
Undoubtedly Text Machine's learning curve beats that of traditional greps, hands down. It would be wrong, though, to pretend there is no learning curve. In this version, Text Machine is not, I think, grep for the user-in-the-street. Let's face it: Text Machine is geeky. You have to be willing to script, and even though Text Machine's phraseology is easy to learn, experimentation and ingenuity may still be required to get the right results.
However, in my vocabulary, "geeky" is a term of praise. Perhaps you have to be weird in just the way that I am in order to appreciate it, but I think that Text Machine is, well, beautiful. It offers a single functionality, superbly realized, placed at the service of other applications. I've waited months for Text Machine to come to fruition, and now it has a firm place in my bag of tools. PreFab has a free 30-day demo to let you decide if you feel the same way.
Text Machine costs $75, until 12-Nov-97 when the price increases to $95. Those who purchase 1.0 get a free upgrade to version 1.1.