Love it or hate it, Mac OS X ships with Unix under its hood. As a user, I worry the Mac experience could degrade into editing brittle text configuration files and typing obscure and unforgiving commands. As a programmer, I'm overjoyed because we Mac users now have access to certain industrial-strength software. This is the type of software that drives Fortune 500 companies, calculates extremely complex chemical reactions, and generates the movies we watch. Since I don't make movies and I'm not a scientist, I'm most interested in the business side of this software. In particular, I'm interested in relational databases.
On the classic Mac OS, FileMaker Pro and 4D dominate the database scene. I'm partial to the newcomer Valentina, while other folks swear by Helix RADE or Frontier [for context, see Matt Neuburg's articles on these last two. -Adam]. Unfortunately, none of these databases qualify as "industrial strength." Don't get me wrong: they do their jobs well, but they lack the qualities that many database professionals crave: SQL and ACID. But before we dive into those two acronyms, let me introduce you to the relational database model. In the next installment of this article, we'll look at some of the relational databases that become available to Macintosh users under Mac OS X.
Relational Databases -- Although there are many different types of databases (free form, hierarchical, network and object relational to name a few), the relational database model is the favorite of businesses.
Introduced by mathematician Dr. E. F. Codd in the early 1970s, the model is simple (though most books like to obscure it behind mathematical jargon). Imagine a spreadsheet where you keep a list of your customers:
CUSTOMER Table CUSTOMER_ID NAME EMAIL 1 Steve Jobs firstname.lastname@example.org 2 John Sculley email@example.com 3 Michael Spindler firstname.lastname@example.org 4 Gil Amelio email@example.com
Notice that you have three columns of information, with each column dedicated to holding a certain nugget of information. You have four customers, each represented by a distinct row.
The relational model calls this data layout a "table;" a relational database contains one or more tables. Although similar in concept to a spreadsheet, a table is different in that each column can hold only one type of data. For example, it would be illegal to put text into the "CUSTOMER_ID" column - it can hold only numbers. Also, unlike a spreadsheet, the relational model doesn't allow cells to hold formulas (each cell must stand alone and can't refer to another cell).
If you're used to thinking of databases as a bunch of index cards (as in FileMaker), here's a helpful guide: a table is analogous to a stack of cards, a row is analogous to a single card (a record), and a column represents a single field on a card.
Now, let's say you want to keep track of your customers' purchases. You whip up another table:
PURCHASE Table PURCHASE_ID CUSTOMER_ID DESCRIPTION 1 1 Black turtleneck shirt 2 2 Book: "How to Sell Sugar Water" 3 1 Faded blue jeans 4 3 Golden parachute 5 1 12-pack, bottled water 6 4 Book: "The Second Coming of Steve Jobs"
You can add rows to this table as customers make purchases. Each purchase has a "CUSTOMER_ID" column, which can be used to relate a purchase with a customer. For instance, in this table we know that Purchase #1 was made by Customer #1.
Let's explore how these relationships can work. Given a PURCHASE_ID, it's easy for us to retrieve the purchaser's email address. Suppose we're interested in the fourth purchase; its CUSTOMER_ID field is set to 3. By scanning our customer list for customers with an ID set to 3, we discover a Michael Spindler, email address <firstname.lastname@example.org>.
Relationships can also work the other way: given a CUSTOMER_ID, we can work backwards to compile a list of purchases made. Let's start off with Steve Jobs, who has a CUSTOMER_ID of 1. Now we scan our purchase list, where we discover three rows with matching CUSTOMER_ID fields: purchases 1, 3, and 5.
By following good design rules when setting up your tables, your database will have little or no duplicate data and will accept only valid data. Another perk is that nothing in your database is tied to a specific program - if you outgrow your current database program, you can move to another without much effort.
Finally, relational databases are very scalable. You can start off on a $400 PC running Linux and migrate the same database to $400,000 IBM big iron. The only difference is speed and reliability. You can see why businesses like relational databases.
Now that you know the general idea about relational databases, we can decode the SQL and ACID acronyms I mentioned earlier.
SQL -- SQL stands for Structured Query Language, and is correctly pronounced by spelling out its letters ("ess cue el"). Some folks pronounce it "sequel," however this is incorrect: there was a language named SEQUEL that was SQL's forerunner. A minority pronounce SQL as "squeal," which never truly caught on, probably for the same reason SCSI was never pronounced "sexy" - it sounded silly in the boardroom. ("We'll need to attach a sexy drive to our squeal server." Sure you're going to say that to the big boss.)
SQL is the standard language used to communicate with relational databases. Because it's actually a full language, users, developers, and software programs can use it to create, alter, and delete tables and the rows of information they contain. The use of a standard language opens relational databases up to a wide variety of interfaces and access methods that would have to be written from scratch individually for other types of databases. That accounts for one of the limitations of traditional Macintosh databases.
Like HTML, SQL is a declarative language. It contains no variables or loops, and is easy to learn even for the non-programmer. With a non-declarative language, you must spell out the steps necessary to complete a task. A declarative language, on the other hand, simply allows you to state the desired end-result. SQL is an older language, and although it is case insensitive, convention capitalizes almost everything. Here's a valid SQL statement to create the customer table discussed above:
CREATE TABLE "CUSTOMER"( "CUSTOMER_ID" INTEGER, "NAME" CHAR(100), "EMAIL" CHAR(200) );
This command creates a table named CUSTOMER with three columns: CUSTOMER_ID, NAME and EMAIL. The CUSTOMER_ID column is defined to hold a number, while the NAME and EMAIL columns are respectively defined to contain 100 and 200 characters.
It's easy to enter information into a table using the INSERT verb:
INSERT INTO "CUSTOMER"( "CUSTOMER_ID", "NAME", "EMAIL" ) VALUES ( 1, 'Steve Jobs', 'email@example.com' );
Space prohibits me from detailing the syntax for altering and deleting rows and tables, but it's just as easy as creating and inserting tables and rows.
The key SQL verb is SELECT, which allows you to access and filter information from the database. For example, we can look up a customer's email address like so:
SELECT "EMAIL" FROM "CUSTOMER" WHERE "NAME" = 'Gil Amelio';
Here's the result you get back:
The result takes the form of a table. Granted, in this case it's a table with only one column, but it's a table nonetheless.
As a final example, given a name, the following query displays all of a customer's purchases. It's okay if you don't understand it, I just wanted to show off a little of what you can do with SQL.
SELECT "PURCHASE_ID","DESCRIPTION" FROM "PURCHASES" WHERE "CUSTOMER_ID" = ( SELECT "CUSTOMER_ID" FROM "CUSTOMER" WHERE "NAME" = 'Steve Jobs' );
Here's the result:
PURCHASE_ID DESCRIPTION 1 Black turtleneck shirt 3 Faded blue jeans 5 12-pack, bottled water
The ACID Test -- ACID stands for "Atomicity, Consistency, Isolation, and Durability." These are the features that separate the pros (Oracle, PostgreSQL) from the minor leaguers (FileMaker Pro, 4D). When your business rides on the quality of your information, ACID is the feature set that helps you sleep at night.
Atomicity (pronounced "atom-ih-sit-ee") comes from the word atom and its original meaning: that which is indivisible. In a database, that means that multiple operations are all bundled up into one indivisible transaction. Either all of the transaction's operations take place, or none of them do. This helps to ensure the database is in a valid state at all times.
Consistency is the principle that only operations that meet all the database's validity constraints are allowed. The end effect of this is that illegal operations aren't allowed, whether they are external (perhaps users enter invalid data) or internal (perhaps a disk fills up and a required row can't be added).
In this wild Web world, databases have to deal with multiple concurrent modifications. But what happens when Alice's transaction is modifying the table that Bob's transaction is reading? Isolation ensures that Bob's transaction sees the table as it existed before Alice's transaction started or after it completed, but never the intermediate state.
Finally, Durability is the principle that once a transaction is completed, a mere system crash won't wipe it out. In the real world, this means that transactions aren't considered completed until the all the information has been written to a disk.
What's Available? In the next installment of this article, I'll cover the merits of a handful of database applications that can be run under Mac OS X, such as MySQL, FrontBase, and speculation about Oracle's possible entry into the Mac field.
[Jonathan "Wolf" Rentzsch is the embodiment of Red Shed Software and runs a monthly Mac programmer get-together in Northwest Illinois.]