Google wants to index all knowledge, and it thought that scanning a few tens of millions of books might be a good addition to the compendium of billions of Web pages, PDFs, and Word documents they already offer. The only trouble? Most of the books they wanted to scan are still under copyright protection. This caused the Association of American Publishers (AAP), the Authors Guild, and other organizations to gnash their teeth - and file lawsuits.
Last week, Google and a host of these complainants agreed to a settlement that a court must still approve. Google will contribute piles of cash - $125 million - to settle outstanding issues and fund a new copyright clearinghouse that will enable authors and publishers to receive funds for online viewing of works.
The settlement also clears the way for far greater access to orphaned works: books (and other material) that remain protected by copyright, but which are out of print or out of production, the party owning the rights is nowhere to be found, and the works largely unavailable even through lending libraries.
Unlike the outcome of many lawsuits about copyright and access, this settlement could be a big win for authors, publishers, readers, and libraries. Could such a thing be possible?
(Full disclosure: I am a member of the Authors Guild. Although I did not support the particular form of the Authors Guild lawsuit, neither did I cancel my membership as a result of the legal action.)
[Editors' disclosure: With our Take Control hats on, we've worked with Google Book Search for years, and it pains me to say that the experience has been nothing but frustrating, with literally months of delay between uploading a fully searchable PDF - no need to scan anything - and having it posted. Plus, although Google's support people responded quickly to our queries, they were universally useless at addressing any complaints, such as posting delays or the existence of guaranteed broken links to Amazon.com for our titles, given the fact that Amazon doesn't resell our ebooks. I certainly hope that the settlement will mean increased exposure for Google Book Search and our content, and additional sales. -Adam]
The Backstory -- After a couple years of prep work, Google announced in 2004 its Google Print program, later renamed Google Book Search, as well as its Library Project, the controversial part.
Google started partnering with major publishers first, followed by smaller houses - a total of 20,000 so far - to make their books available in some form online.
Google's bigger objective was to partner with major academic libraries around the world, scan books using high-speed techniques it had invented, and use optical character recognition (OCR) technology to turn the scans into searchable text.
Google Book Search made it possible for anyone to search the contents of any scanned book and, depending on the copyright status of the book and other factors, view or even download some or all pages. (Microsoft started two similar programs which avoided many copyright issues, but the company shut those projects down in May 2008.)
This behavior rankled many because Google claimed the right to scan copyright-protected books because the company wasn't per se distributing the books, even though it had full digital copies. Google maintained - in a rough approximation - that because it was working under contract with libraries that owned physical copies of the books, that making archival digital copies was perfectly legitimate, as was turning the copyrighted works into text and images that weren't revealed in whole on the Web.
The various parties aligned against Google disagreed, and filed suit in 2005.
The Variety of Works under Discussion -- Part of what publishers and the Authors Guild found problematic, and part of how the settlement on which parties agreed was designed, centers around separating works into three categories: public domain, in copyright/out of print, and in copyright/in print.
- Public domain works are no longer covered by copyright, and may be used in essentially any form and any fashion. Many publishers, notably Dover, reprint public-domain works in various forms and compendiums. Copyright holders can also release all rights on works they control, placing a creation in the public domain. Google Book Search makes the full text available, including for download.
- Books that are in copyright, but out of print, are often called orphan works when the owner of the rights can't be found or there's no clear owner, as when a writer dies without an estate; there are also plenty of books that writers and publishers can't find a way to get back into print or wouldn't consider bringing back into print due to low sales or other complexities. This broad category covers books that are no longer stocked or available from the commercial book trade, although sometimes individual authors buy remainders from a publisher - the last in-stock copies that a publisher was intended to turn into pulp - and sells them through hard effort. The copyright for out-of-print books may be owned by a living person or his or her estate, by a trust, by a publisher, or by a company; or it may be entirely unclear who (if anyone) owns the copyright, which is likely the case for many works created before the 1980s. Out-of-print works make writers cry, because their hard-wrung prose - fiction or non-fiction or reference - is unavailable, even if the market desires it, because the economics of print publishing have until recently put their children in the gutter. Google Book Search makes the full text searchable, with snippets of context presented.
- Active books are in copyright and in print. Books that are actively sold by publishers through booksellers or directly, even if they're 30, 40, or 70 years old, fit in this category. Publishers often refer to their frontlist, books that are relatively new and actively promoted, and their backlist, titles still in stock and available, and which may even sell well, but which aren't promoted. The same searching and results are allowed as with out of print titles. (By the way, Amazon's special-order books program, launched at the same time as the bookseller's overall store in 1995, was the first simple way to obtain in-print books that weren't routinely stocked by either bookstores or book distributors. Prior to Amazon, special order books required time and effort on the part of a bookseller, and were often regarded as a giant pain to fulfill.)
These three categories raise the question: what's covered under copyright, anyway? I'm glad you asked.
Copyright's Increasing Longevity -- Copyright law in the United States has been tweaked quite a bit since the right was granted in the Constitution, and because of this, there's quite a bit of complexity involved. The U.S. Copyright Office has a brief explanation, as well as a more extended discussion of terms.
If I can try to boil the discussion down for published works copyrighted in the United States:
- Everything copyrighted - registered with the Copyright Office - before 1922 is in the public domain.
- Nearly everything registered as under copyright starting in 1922 was under copyright initially for a term of 28 years, which could be renewed on the 28th anniversary through the Copyright Office for another 28 years.
- Works registered starting 01-Jan-50 are grandfathered through a variety of rules to extend their copyright with no renewal being required. There are a lot of niceties involved, but this is the general rule.
- Any work copyrighted from 01-Jan-78 on is under copyright protection the moment it's created for the author's life plus 70 years, or for 95 years from publication for works owned by a company - so-called "work for hire," in which a work was created by a statutorily defined employee of a firm or institutions, or for which copyright has been transferred by the individual or people involved to a company. No registration is required, but it ensures both a proof of ownership along with the maximum statutory damages (treble!) for successful proof of violation. (Before the Sonny Bono Copyright Term Extension Act of 1998, the duration was 50 years following death or 75 years for works for hire. This was also pejoratively known as the Mickey Mouse Protection Act, because Mickey's appearance in Steamboat Willie would have entered the public domain in 2000.)
A lot more explanation, which I'll avoid here, is necessary for rules surrounding other countries' copyright regulations prior to general international agreement in the 1970s about copyright terms, and rules in the United States for anonymous, pseudonymous, and unpublished works.
If you read this carefully, you'll notice a gap. If a work was registered starting in 1922 and before 1950, it would wind up in the public domain if a renewal notice were not filed. It's unclear how many hundreds of thousands or millions of works may have fallen into that gap.
But you can see that there's a giant divide. Before 1922, essentially everything. After 1922, nothing that anyone paid attention to.
Fair Use -- Copyright law contains a giant set of exemptions that are supposed to balance the U.S. Constitution's language against the public good. Article 1, Section 8, states that Congress shall have power "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."
Many arguments have been made about what limited times means - Stanford law professor Larry Lessig argued the Sonny Bono Act all the way to the Supreme Court - but the idea that copyright is intended not solely for the benefit of "authors and inventors" but for society as a whole should be undisputed. (If you've followed the actions of the movie and recording industries, and legislative efforts to support their actions, you might believe that copyright is all about ownership, not public good.)
In that spirit, Congress defined exceptions to copyright, including fair use, which have further been refined by practice and the courts. There's a quadripartite test when a claimed fair use is examined: the commercial nature or lack thereof; the kind of work involved; the quantity of work used in relation to the original; the effect on the market of the original work. The test doesn't require every element to be met, but each part to be evaluated against the whole. (You can read about this in more depth at the Copyright Office.)
Google has argued that its efforts at scanning copyrighted books and making them available for search with only snippets of results meet the smell test: Google was making no specific commercial return on its book search (in fact, investing tens of millions into its library-scanning efforts with libraries), that the works were intended for public distribution, that snippets were infinitesimal parts of books, and that the search giant was stimulating demand for the books it provided results against. Google provided links to purchase the books, and could thus track sales, too.
The Authors Guild, among others, stated that simply the act of creating electronic editions that were stored and distributed, required permission from copyright holders, much less displaying the results. With a little programming work, an interested party could extract passages or entire books, too.
Without being a lawyer specializing in this area, I believe it was and remains impossible to determine whether Google or its one-time opponents would have prevailed. They clearly would have created a new sub-area of law, either affirming, denying, or making far more complicated the notion of whether creating and owning copies of copyrighted works were de facto violations of the law.
But these one-time opponents are now at least somewhat supportive of Google's efforts. What changed? Quite a lot, and in ways that all parties, and we readers, stand to benefit.
Out-of-Print Books and Book Rights Registry -- The settlement opens the way to allowing vastly improved availability of in-copyright books by separating out-of-print and in-print books into their respective categories, and collecting fees for all snippet displays, page reading, and page printing.
Publishers, authors, and other copyright holders will be able to opt-out of having out-of-print books included; by default, all out-of-print books will be available, but parties can opt out. For in-print books, those who own the rights will opt in. This allows all of Google's existing partners to continue what they're doing, and publishers to experiment by adding specific titles or simply adding their entire catalogs.
If I read the settlement right, publishers who do not opt in to allow in-print titles to be included by Google will simply have their works removed if available or not added in the future. (A complete set of links to resources can be found at the Authors Guild site.)
Where this agreement goes far beyond Google's current program, making it a win for Google, is that Google will now be able to provide not just snippet results, but entire pages or books (for viewing and printing).
Google would collect the fees and pass them on to the Book Rights Registry, which will be run by a board of authors and publishers, and be founded with $34.5 million of a $125 million settlement that Google has agreed to pay - without admitting that any of Google's claims are invalid.
Authors and publishers win by suddenly having a mechanism to disseminate electronic editions while collecting for per-snippet, per-page viewing, and per-page printing. Google has agreed to a 63-37 split in favor of the copyright holder.
The public wins because the settlement calls for a free subscription license for "designated" computers at all U.S. public and academic libraries - a miserly 1 per public library building or either 1 per 4,000 or 1 per 10,000 students, depending on the institution type. Google has also agreed to pay all printing royalty fees for 5 years or up to $3 million, whichever comes first, for these qualifying locations.
Other institutions can pay for overarching printing and reading licenses, and public libraries can upgrade to fuller licenses, too. Without knowing what these more extensive subscriptions cost, it's hard to know whether public libraries will be able to afford them. Wade Roush of Xconomy, from whose writing I learned about the limits on free library access, is down on the whole deal, partly due to the scale of free access and partly due to the default pricing that Google will set on out-of-print, in-copyright books.
Anyone who researches a topic should benefit from the availability of out-of-print works, as they comprise many millions of titles that are rarely available in wide circulation. Ten libraries around the world might have a particular book you need, but that doesn't mean you can gain access to it.
Google has also agreed to pay legal fees, and at least $45 million to copyright holders whose works were scanned before a certain date connected to the lawsuit.
Now, of course, not all publishers or copyright holders are represented by the parties involved, and some may choose to sue separately in the future. The court might also require the parties to appear in court, although courts prefer settlements.
The only fly in the ointment is that copyright holders of out-of-print but in-copyright works are being de facto opted in to having their works available by virtue of this settlement, even if they're not party to it. That should fly, because most of these creators or owners can get no value out of their works at present, and few people complain about receiving additional compensation. Further, the creation of a clearinghouse gives a kind of imprimatur, allowing a party that represents authors and publishers to make sure out-of-print works see life again.
There was the notable case in the music world of James Carter, a former convict whose voice was recorded on a chain gang in 1959 by pioneering folk music collector Alan Lomax. In 2002, the song he sang, "Po' Lazarus," was used in the opening of the movie "O Brother, Where Art Thou?" The soundtrack sold 4 million copies.
Carter, who left prison in 1967 and had led a quiet life since, was tracked down after months of research by the Lomax archives, and presented with a $20,000 check; he received $100,000 by his death in 2003.
Avoiding Collision with the Future -- I'm a writer. I make my living by sitting down and typing, as I am now. The notion of Google appropriating my words without my permission or acknowledgment always bothered me, even though I also accepted that there was a fine chance that the company was operating within the legal constraints of copyright law.
I similarly was troubled by the Authors Guild partnering with what is often its natural enemy, the AAP, in trying to prevent Google from related activities, some of which seemed to benefit me and authors, and others of which did not. (For instance, the AAP at times has suggested that public libraries should pay fees to publishers when they lend works. While this is the case in EU nations, authors generally don't believe that publishers would pass along these fees to authors; that's separate from the seemingly un-American idea that public libraries pay royalties!)
This reconciliation doesn't solve all issues, but it makes it much more likely that independent authors and publishers survive and even thrive by providing a broader marketplace, while also providing greater availability of human knowledge. While the ease of access to publicly promulgated information, like Web pages, has increased, trends seemed to suggest that books would go down the path that movies are still taking and music is slowly escaping from: being available only in highly restricted ways that interfere with technological progress.
With this new agreement in place, it's possible that you could publish a book, distribute it entirely through Google Book Search, and earn some money - maybe even a lot of money if the book goes viral - and bypass publishers entirely. That was the promise of the Internet music, blog, and podcast revolutions, too. While it hasn't come true for everyone, it's certain that many more voices are being heard by many more people around the world. And that's a good thing.
[Note: This article was edited to clarify the difference between in-copyright, out-of-print works which are orphaned - no copyright owner is known or can be found - and those that are not.]