In the classic science-fiction movie “Forbidden Planet,” an invisible, rampaging beast has killed all but two members of a colony and attacks a visiting spaceship and its personnel as well. In the end – spoiler alert! – it turns out that the monster was created from the id of the remaining scientist. He had used an ancient, extinct people’s technology that freed their minds of physical instrumentality, giving them untapped power managed directly by their minds. The scientist dies, nobly, and sets the destruction of the planet in motion.
This movie was brought to mind by Amazon.com’s latest non-retail product launch: the beta test version of Amazon Elastic Compute Cloud (dubbed Amazon EC2), an on-demand service that runs virtual machines you configure, charging by the hour they’re in use. This approach eliminates the need to own the physical equipment on which to run a virtual machine – such as a disk image launched inside Parallels Desktop to run Windows XP on an Intel-based Mac – or an actual operating system.
Existing products already enable high-end server computers to run multiple simultaneous virtual machines, and to group server computers together to scale virtual machines to a massive level. VMware and XenSource make such tools, which are frequently used in large corporate information technology operations. In fact, Amazon.com is using XenSource’s software to run EC2.
As we’ve written about in TidBITS in recent months (such as “WinOnMac Smackdown: Dual-Boot versus Virtualization”, 10-Apr-06) virtual machines let you take advantage of all the benefits of a given operating system while using as much CPU power as that operating system can exploit. Plus, if one virtual machine is idle, others on the same server computer can take advantage of free cycles for their own tasks. From the providers’ standpoint, virtual machines eliminate the need to manage individual computers for individual customers, and providers can treat a large grid of computers as a set of processor resources instead of as individually managed operating systems with their own quirks.
Having virtual machines available at your beck and call means you could cope with a sudden spike in activity by distributing the load onto dozens or hundreds of virtual machines for a short period of time. If we were to create a set of Web pages backed by a database that suddenly was hit by tens of millions of queries a day, our systems would bog down and our service provider might have to ratchet up their fiber-optic-based bandwidth. And we’d wind up paying a small fortune for excess bandwidth.
With EC2, we could just fire up a bunch of identical servers and pay 20 cents per gigabyte of bandwidth, about a fifth of the going rate at many colocation facilities. That’s right – a terabyte of bandwidth usage would cost $200, seemingly cheap if that data would also be generating some revenue. More importantly, though, instead of bringing in a pile of servers or dealing with highly non-responsive machines, you’d pay just during those peak hours or days and then shut off virtual machines as they were no longer needed.
Amazon EC2 has a fairly rigid starting point in their beta test phase, which is currently closed to new users. Each virtual machine you launch as an instance – up to 20 instances simultaneously unless you ask for more – acts as the equivalent of a 1.7 GHz Xeon CPU with 1.75 GB of RAM, 160 GB of hard disk storage, and 250 megabits per second of network bandwidth. For each hour (or partial fraction thereof) on each instance, you pay $0.10, which comes out to $72.00 per 30-day month.
To use EC2, you start with a set of prefabricated disk images that rely on the Fedora Core operating system, a successor to earlier Linux operating systems created by Red Hat. You can also install other Linux systems using instructions Amazon.com provides.
You can probably imagine why this makes me curious. The recently introduced Intel-based Mac Pro towers are based on Intel’s latest Xeon processors. Of course, Apple doesn’t allow its operating system to be virtualized or run on generic Intel hardware. Still, with a company like Amazon.com offering this sort of service, could Apple license Mac OS X for this sort of purpose? It seems unlikely, given Apple’s history, but it’s not unreasonable or impossible that they would make such a deal. The advantage of this would be for an Xserve owner, say, who could replicate his or her setup to run identical instances on demand without having to manage Mac OS X on a core, “real” computer and Linux on virtual machines.
Once you set up a system you want to employ with a prefab image or by creating your own, you can boot up the system using command-line tools or an application programmer’s interface (API) that allows the automation of many steps. The resulting system can be accessed via a standard Secure Shell (SSH) connection – ssh is built into Mac OS X and accessible via Terminal – and the Amazon.com system assigns it a unique, routable Internet protocol address and associated host name.
Data is not persistent, however. The disk image contains everything that the virtual machine has available at startup; any data written to the virtual disk is lost when the instance is shut down, force terminated, or crashes. Amazon.com suggests loading information from a shared resource for that reason, and writing data that needs to be persistent out to external Internet storage for safety’s sake.
It just so happens that Amazon.com runs a giant Internet storage system as well. EC2 is complementary with Amazon Simple Storage Service (S3), launched in April 2006, which enables you to store static objects from 1 byte to 5 GB in size. The disk images you create for EC2 must be stored in S3.
S3 and EC2 both cost $0.20 per gigabyte for bandwidth transferred, although moving data between S3 and EC2 is free. S3 also charges $0.15 per gigabyte stored each month. For perspective, if you were to store 1 TB of data at S3, use 1 TB of bandwidth, and use 5,000 hours of virtual machine time a month (the equivalent of seven full-time servers), you would pay $200 plus $150 plus $500 or a grand total of $850. Not bad. For comparison, I pay about $800 per month to house four servers with about 200 GB of monthly bandwidth included. The upside? My data is persistent.
The big missing piece for EC2 is clearly database storage and service. I currently run a dedicated database server that cost several thousand dollars and that handles my isbn.nu book-price comparison service, TidBITS’s newest searchable article database, and my various blogs’ posting databases. I would gladly consider moving those databases to a super-fast, pay-per-transaction or per-CPU-cycle system that was pure information from my perspective, if the price was right.
Tie in EC2 and S3 with a database service, and Amazon.com would have eliminated the vast majority of many smaller companies’ hardware needs – and sparked the development of firms that require almost no hardware or bandwidth for computation and storage. This is nothing like a new model, but it’s getting close to the most wide-scale commoditization that I’ve ever seen. It’s affordable, too.
Sure, if this plays out and I go all virtual, I’ll miss my server hardware, just like I’ll miss my body when we ultimately evolve into beings of pure energy. Let’s just hope we don’t let our processes or our ids run wild.