Series: Can You Relate?
Mac OS X's Unix underpinnings make a new world of relational databases available to the Mac
Article 1 of 2 in series
Love it or hate it, Mac OS X ships with Unix under its hood. As a user, I worry the Mac experience could degrade into editing brittle text configuration files and typing obscure and unforgiving commandsShow full article
Love it or hate it, Mac OS X ships with Unix under its hood. As a user, I worry the Mac experience could degrade into editing brittle text configuration files and typing obscure and unforgiving commands. As a programmer, I'm overjoyed because we Mac users now have access to certain industrial-strength software. This is the type of software that drives Fortune 500 companies, calculates extremely complex chemical reactions, and generates the movies we watch. Since I don't make movies and I'm not a scientist, I'm most interested in the business side of this software. In particular, I'm interested in relational databases.
On the classic Mac OS, FileMaker Pro and 4D dominate the database scene. I'm partial to the newcomer Valentina, while other folks swear by Helix RADE or Frontier [for context, see Matt Neuburg's articles on these last two. -Adam]. Unfortunately, none of these databases qualify as "industrial strength." Don't get me wrong: they do their jobs well, but they lack the qualities that many database professionals crave: SQL and ACID. But before we dive into those two acronyms, let me introduce you to the relational database model. In the next installment of this article, we'll look at some of the relational databases that become available to Macintosh users under Mac OS X.
Relational Databases -- Although there are many different types of databases (free form, hierarchical, network and object relational to name a few), the relational database model is the favorite of businesses.
Introduced by mathematician Dr. E. F. Codd in the early 1970s, the model is simple (though most books like to obscure it behind mathematical jargon). Imagine a spreadsheet where you keep a list of your customers:
CUSTOMER Table CUSTOMER_ID NAME EMAIL 1 Steve Jobs firstname.lastname@example.org 2 John Sculley email@example.com 3 Michael Spindler firstname.lastname@example.org 4 Gil Amelio email@example.com
Notice that you have three columns of information, with each column dedicated to holding a certain nugget of information. You have four customers, each represented by a distinct row.
The relational model calls this data layout a "table;" a relational database contains one or more tables. Although similar in concept to a spreadsheet, a table is different in that each column can hold only one type of data. For example, it would be illegal to put text into the "CUSTOMER_ID" column - it can hold only numbers. Also, unlike a spreadsheet, the relational model doesn't allow cells to hold formulas (each cell must stand alone and can't refer to another cell).
If you're used to thinking of databases as a bunch of index cards (as in FileMaker), here's a helpful guide: a table is analogous to a stack of cards, a row is analogous to a single card (a record), and a column represents a single field on a card.
Now, let's say you want to keep track of your customers' purchases. You whip up another table:
PURCHASE Table PURCHASE_ID CUSTOMER_ID DESCRIPTION 1 1 Black turtleneck shirt 2 2 Book: "How to Sell Sugar Water" 3 1 Faded blue jeans 4 3 Golden parachute 5 1 12-pack, bottled water 6 4 Book: "The Second Coming of Steve Jobs"
You can add rows to this table as customers make purchases. Each purchase has a "CUSTOMER_ID" column, which can be used to relate a purchase with a customer. For instance, in this table we know that Purchase #1 was made by Customer #1.
Let's explore how these relationships can work. Given a PURCHASE_ID, it's easy for us to retrieve the purchaser's email address. Suppose we're interested in the fourth purchase; its CUSTOMER_ID field is set to 3. By scanning our customer list for customers with an ID set to 3, we discover a Michael Spindler, email address <firstname.lastname@example.org>.
Relationships can also work the other way: given a CUSTOMER_ID, we can work backwards to compile a list of purchases made. Let's start off with Steve Jobs, who has a CUSTOMER_ID of 1. Now we scan our purchase list, where we discover three rows with matching CUSTOMER_ID fields: purchases 1, 3, and 5.
By following good design rules when setting up your tables, your database will have little or no duplicate data and will accept only valid data. Another perk is that nothing in your database is tied to a specific program - if you outgrow your current database program, you can move to another without much effort.
Finally, relational databases are very scalable. You can start off on a $400 PC running Linux and migrate the same database to $400,000 IBM big iron. The only difference is speed and reliability. You can see why businesses like relational databases.
Now that you know the general idea about relational databases, we can decode the SQL and ACID acronyms I mentioned earlier.
SQL -- SQL stands for Structured Query Language, and is correctly pronounced by spelling out its letters ("ess cue el"). Some folks pronounce it "sequel," however this is incorrect: there was a language named SEQUEL that was SQL's forerunner. A minority pronounce SQL as "squeal," which never truly caught on, probably for the same reason SCSI was never pronounced "sexy" - it sounded silly in the boardroom. ("We'll need to attach a sexy drive to our squeal server." Sure you're going to say that to the big boss.)
SQL is the standard language used to communicate with relational databases. Because it's actually a full language, users, developers, and software programs can use it to create, alter, and delete tables and the rows of information they contain. The use of a standard language opens relational databases up to a wide variety of interfaces and access methods that would have to be written from scratch individually for other types of databases. That accounts for one of the limitations of traditional Macintosh databases.
Like HTML, SQL is a declarative language. It contains no variables or loops, and is easy to learn even for the non-programmer. With a non-declarative language, you must spell out the steps necessary to complete a task. A declarative language, on the other hand, simply allows you to state the desired end-result. SQL is an older language, and although it is case insensitive, convention capitalizes almost everything. Here's a valid SQL statement to create the customer table discussed above:
CREATE TABLE "CUSTOMER"( "CUSTOMER_ID" INTEGER, "NAME" CHAR(100), "EMAIL" CHAR(200) );
This command creates a table named CUSTOMER with three columns: CUSTOMER_ID, NAME and EMAIL. The CUSTOMER_ID column is defined to hold a number, while the NAME and EMAIL columns are respectively defined to contain 100 and 200 characters.
It's easy to enter information into a table using the INSERT verb:
INSERT INTO "CUSTOMER"( "CUSTOMER_ID", "NAME", "EMAIL" ) VALUES ( 1, 'Steve Jobs', 'email@example.com' );
Space prohibits me from detailing the syntax for altering and deleting rows and tables, but it's just as easy as creating and inserting tables and rows.
The key SQL verb is SELECT, which allows you to access and filter information from the database. For example, we can look up a customer's email address like so:
SELECT "EMAIL" FROM "CUSTOMER" WHERE "NAME" = 'Gil Amelio';
Here's the result you get back:
The result takes the form of a table. Granted, in this case it's a table with only one column, but it's a table nonetheless.
As a final example, given a name, the following query displays all of a customer's purchases. It's okay if you don't understand it, I just wanted to show off a little of what you can do with SQL.
SELECT "PURCHASE_ID","DESCRIPTION" FROM "PURCHASES" WHERE "CUSTOMER_ID" = ( SELECT "CUSTOMER_ID" FROM "CUSTOMER" WHERE "NAME" = 'Steve Jobs' );
Here's the result:
PURCHASE_ID DESCRIPTION 1 Black turtleneck shirt 3 Faded blue jeans 5 12-pack, bottled water
The ACID Test -- ACID stands for "Atomicity, Consistency, Isolation, and Durability." These are the features that separate the pros (Oracle, PostgreSQL) from the minor leaguers (FileMaker Pro, 4D). When your business rides on the quality of your information, ACID is the feature set that helps you sleep at night.
Atomicity (pronounced "atom-ih-sit-ee") comes from the word atom and its original meaning: that which is indivisible. In a database, that means that multiple operations are all bundled up into one indivisible transaction. Either all of the transaction's operations take place, or none of them do. This helps to ensure the database is in a valid state at all times.
Consistency is the principle that only operations that meet all the database's validity constraints are allowed. The end effect of this is that illegal operations aren't allowed, whether they are external (perhaps users enter invalid data) or internal (perhaps a disk fills up and a required row can't be added).
In this wild Web world, databases have to deal with multiple concurrent modifications. But what happens when Alice's transaction is modifying the table that Bob's transaction is reading? Isolation ensures that Bob's transaction sees the table as it existed before Alice's transaction started or after it completed, but never the intermediate state.
Finally, Durability is the principle that once a transaction is completed, a mere system crash won't wipe it out. In the real world, this means that transactions aren't considered completed until the all the information has been written to a disk.
What's Available? In the next installment of this article, I'll cover the merits of a handful of database applications that can be run under Mac OS X, such as MySQL, FrontBase, and speculation about Oracle's possible entry into the Mac field.
[Jonathan "Wolf" Rentzsch is the embodiment of Red Shed Software and runs a monthly Mac programmer get-together in Northwest Illinois.]
Article 2 of 2 in series
As Mac users confront the geeky realities associated with Unix as the core of Mac OS X, they may not be aware of their newly acquired capability to run powerful relational database softwareShow full article
As Mac users confront the geeky realities associated with Unix as the core of Mac OS X, they may not be aware of their newly acquired capability to run powerful relational database software. In part one of this article, I discussed the basics of how relational databases work (see TidBITS-580). This week, I want to cover some commercial and open-source databases currently available for Mac OS X.
As with last week's release of FileMaker 5.5 with support for Mac OS X, most of the databases that run under the Classic Mac OS will be ported to Mac OS X. However, we're also seeing an influx of newly available database programs. All of these databases have been around for years on different platforms; it's only with Mac OS X that Mac users can finally run them.
It's worth noting that, for the first time, the Mac OS finally pulls even with, if not ahead of, Windows in terms of database power. Although the Macintosh world still lacks a friendly low-end SQL database like Microsoft Access, the quantity and quality of databases available for Mac OS X is incomparable, especially if Oracle climbs aboard.
Also keep in mind that none of these databases are meant to be used directly for day-to-day data entry and queries like FileMaker or 4D. MySQL and PostgreSQL are command-line driven databases, while FrontBase and OpenBase provide only rudimentary data input and retrieval interfaces. Instead, these back-end databases work behind the scenes and are meant to be coupled with some sort of front-end interface, be it a Web page or a desktop application.
MySQL -- MySQL is the most popular open source database, and unlike many databases, MySQL will handle large bodies of text, making it suitable for Web publishing and messaging systems such as those found on Web forums. On the down side, MySQL doesn't embrace ACID (Atomicity, Consistency, Isolation, and Durability, as we learned in the first part of this article). Transaction support was added only recently, and it is rather bolted-on (MySQL transactions lock entire tables). ACID needs to be built in from the ground up. The lack of transaction support used to give MySQL a speed advantage, but PostgreSQL has been proven comparable to MySQL in many tasks.
Finally, although MySQL supports online backups, it locks the database from updates (though not read-only accesses) while performing the copy. Online backups enable you to back up your database while without having to shut it down entirely.
Bottom Line: MySQL is free and well suited for content-oriented systems, but for traditional business uses I'd go with PostgreSQL or FrontBase.
PostgreSQL -- PostgreSQL is probably the best open source database. It supports transactions, which makes it suitable for serious business use. It offers online backups, and unlike MySQL, will continue to process database updates during backups. PostgreSQL's previous weakness of an 8K row-size limitation has mercifully gone away in version 7.1.
PostgreSQL still suffers from the need to "VACUUM" the database routinely. VACUUM is a PostgreSQL-only, non-standard SQL command which generally cleans up a database. The VACUUM command can be time consuming (15 minutes is not uncommon), and locks out any use of the database while running. You also don't have the option of simply letting the database get dirty - PostgreSQL will start failing mysteriously if you don't VACUUM regularly. Different situations call for different VACUUM frequencies, but some folks perform the operation once a week, while others do it every hour.
Bottom Line: PostgreSQL is free and is the best open source database for running businesses on Mac OS X.
FrontBase -- I really like FrontBase. Like PostgreSQL, it supports SQL92 (the latest version, circa 1992, of the international SQL standard). Each database resides in one file, making database file identification and transport a breeze. It supports online backups, clustering (the capability to have two or more machines share a database and hand off connections to one another, increasing reliability and speed), and offers raw disk support (bypassing file system overhead).
It strongly embraces ACID, requiring you commit almost every change to the database. It sports a graphical administration tool on Mac OS X and X Server, while the engine itself runs on many other operating systems (Windows NT/2000, Linux, LinuxPPC, etc.). It offers a Web-based administration tool as well. On Mac OS X, it uses the standard system Installer, which is graphical and friendly.
Unlike MySQL and PostgreSQL, FrontBase is not open source. However, there are two free FrontBase licenses. The first, the developer license, enables all of FrontBase's features for six months (renewable), but doesn't give you deployment rights, so you can't let anyone else use your database. The second free license allows deployment but doesn't allow hot backups, clustering, or external connections to the database (defined as remote connections over a network; CGI or WebObjects connections from the same machine are fine) . There's a $999 license that allows external connections and hot backups, and a $3,499 license adds clustering.
FrontBase makes it easy to import your data, including instructions and tools to convert existing databases from FileMaker Pro, MySQL, OpenBase and Sybase. I can personally vouch that the FileMaker Pro tool works as advertised. With the converter, FrontBase plus WebObjects makes an attractive path for FileMaker Pro developers who need more power.
There are two drawbacks to FrontBase. The first is that you must enter a license number after installing the software. This isn't bad in itself, however the license is tied to your machine's IP address and won't work with DHCP (which may provide your Mac with a different IP address on every restart). That can make it tricky to run FrontBase on a travelling PowerBook or a Mac on a DSL or cable modem connection that requires you to use DHCP.
A FrontBase representative informed me this is because Mac OS X Server doesn't allow software to retrieve the computer's Ethernet MAC address without running under the root account. Since FrontBase didn't want to force their users to run FrontBase under root (a bad security idea), they went with what they could access: the IP address. The desktop version of Mac OS X changes that, and FrontBase will tie the license to the MAC address in the future.
FrontBase's second drawback is that the company doesn't offer on-site support. Although FrontBase requires little administration and their email support is quick and competent, this could be a deal breaker for some companies.
Bottom Line: FrontBase is the least expensive commercial high-end database for Mac OS X, but you can't get on-site support.
OpenBase -- OpenBase wins the user interface contest hands down. Its interface is elegant and beautiful, and it contains a reasonable modeling tool which graphically depicts how your database is structured. For example, it represents tables as rectangles and draws lines between them to illustrate their relationships.
OpenBase's engine seems fast, modern, and powerful. Like FrontBase, OpenBase offers a free developer license and supports online backups. A $295 migration tool called ClickConvert helps move data from existing FileMaker Pro databases.
OpenBase's beauty comes from Mac OS X's Cocoa environment, which limits OpenBase's platform support. It supports Mac OS X and X Server out of the box; however to run OpenBase under Solaris or Windows 2000, you first must purchase and install WebObjects (which brings the Cocoa frameworks along with it). Granted, WebObjects has fallen greatly in price recently, however it still adds $700 to OpenBase's $2,000 price. OpenBase's graphical interface is being rewritten in Java, so its future platform support should increase. Alas, technical support is only offered via email.
OpenBase is pricey, but there's new hope for those with tight budgets. While we were preparing this article for publication, OpenBase introduced a new $499 license specifically for use with PHP, a popular tool for linking databases with Web sites. This lower price comes with two restrictions: no external connections (like FrontBase's free deployment license) and no support for WebObjects (unlike FrontBase). However, it does allow online backups, a feature which starts at $999 for FrontBase.
Bottom Line: Ironically, the database with the highest starting price is the best for high-end relational database newbies. If you're a highly paid consultant or need to get a database up quickly, OpenBase's easy user interface may justify its high price.
Oracle? There have been consistent rumors that Oracle will be ported to Mac OS X. Technically, I don't see any reason they couldn't do it. Oracle already runs on a couple flavors of Unix, and Larry Ellison sits on Apple's board of directors.
Many people feel Oracle on Mac OS X will legitimize the platform, and there's logic to that argument. Oracle is known for its power, flexibility and support. However, it is also extremely expensive and complicated, to the point where many people devote their careers to nothing but administrating Oracle databases.
Making the Choice -- For the budget minded, FrontBase's free development and deployment licenses are tough to beat. If you can't spare a dime, but require online backups, then PostgreSQL is your best choice (despite its less-friendly user interface). If elegance, ease of use or speed is important to you, I'd definitely recommend checking out OpenBase. Assuming it ends up being ported to Mac OS X, Oracle would make sense only if you're developing a truly large, complex, or fast database with other peoples' money.
Now that we've looked at what makes a relational database and some of the primary contenders, be sure to look for my upcoming article on a program that brings relational databases back into the forefront of computing: Apple's powerful WebObjects.
[Jonathan "Wolf" Rentzsch is the embodiment of Red Shed Software, and runs a monthly Mac programmer get-together in Northwest Illinois.]