We all use apps. We know they capture information about us. But exactly how much information? I’ve worked as a software engineer at Apple and at a mid-sized tech company. I’ve seen the good and the bad. And my experience at Apple makes me far more comfortable with the system Apple and Google have proposed for COVID-19 exposure notification. Here’s why.
Apple Respects User Privacy
When I worked on the Apple Watch, one of my assignments was to record how many times the Weather and Stocks apps were launched and report that back to Apple. Recording how many times each app is launched is simple. But reporting that data back to Apple is much more complex.
Apple emphasizes that its programmers should keep customer security and privacy in mind at all times. There are a few basic rules, the two most relevant of which are:
- Collect information only for a legitimate business purpose
- Don’t collect more information than you need for that purpose
That second one could use a little expansion. If you’re gathering general usage data (how often do people check the weather?), you can’t accidentally collect something that could identify the user, like the city they’re looking up. I didn’t realize how tightly Apple enforces these rules until I was assigned to record user data.
Once I had recorded how many times the Weather and Stocks apps were launched, I set up Apple’s internal framework for reporting data back to the company. My first revelation was that the framework strongly encouraged you to transmit back numbers, not strings (words). By not reporting strings, your code can’t inadvertently record the user’s name or email address. You’re specifically warned not to record file paths, which can include the user’s name (such as
/Users/David/Documents/MySpreadsheet.numbers). You also aren’t allowed to play tricks like encoding letters as numbers to send back strings (like A=65, B=66, etc.)
Next, I learned I couldn’t check my code into Apple’s source control system until the privacy review committee had inspected and approved it. This wasn’t as daunting as it sounds. A few senior engineers wanted a written justification for the data I was recording and for the business purpose. They also reviewed my code to make sure I wasn’t accidentally recording more than intended.
Once I had been approved to use Apple’s data reporting framework, I was allowed to check my code into the source control system. If I had tried to check my code into source control without approval, the build server would have refused to build it.
When the next beta build of watchOS came out, I could see on our reporting dashboard how many times the Weather and Stocks apps were launched each day, listed by OS version. But nothing more. Mission accomplished, privacy maintained.
TechCo Largely Ignores User Privacy
I also wrote iPhone apps for a mid-size technology company that shall remain nameless. You’ve likely heard of it, though, and it has several thousand employees and several billion dollars in revenue. Call it TechCo, in part because its approach to user privacy is unfortunately all too common in the industry. It cared much less about user privacy than Apple.
The app I worked on recorded every user interaction and reported that data back to a central server. Every time you performed some action, the app captured what screen you were on and what button you tapped. There was no attempt to minimize the data being captured, nor to anonymize it. Every record sent back included the user’s IP address, username, real name, language and region, timestamp, iPhone model, and lots more.
Keep in mind that this behavior was in no way malicious. The company’s goal wasn’t to surveil their users. Instead, the marketing department just wanted to know what features were most popular and how they were used. Most important, the marketers wanted to know where people fell out of the “funnel.”
When you buy something online, the purchase process is called a funnel. First, you look at a product, say a pair of sneakers. You add the sneakers to your shopping cart and click the buy button. Then you enter your name, address, and credit card, and finally, you click Purchase.
At every stage of the process, people fall out. They decide they don’t really want to spend $100 on new sneakers, or their kids run in to show them something, or their spouse tells them that dinner is ready. Whatever the reason, they forget about the sneakers and never complete the purchase. It’s called a funnel because it narrows like a funnel, with fewer people successfully progressing through each stage to the end.
Companies spend a lot of time figuring out why people fall out at each stage in the funnel. Reducing the number of stages reduces how many opportunities there are to fall out. For instance, remembering your name and address from a previous order and auto-filling it means you don’t have to re-enter that information, which reduces the chance that you’ll fall out of the process at that point. The ultimate reduction is Amazon’s patented 1-Click ordering. Click a single button, and those sneakers are on their way to you.
TechCo’s marketing department wanted more data on why people fell out of the funnel, which they would then use to tune the funnel and sell more product. Unfortunately, they never thought about user privacy as they collected this data.
Most of the data wasn’t collected by code that we wrote ourselves, but by third-party libraries we added to our app. Google Firebase is the most popular library for collecting user data, but there are dozens of others. We had a half-dozen of these libraries in our app. Even though they provided roughly similar features, each collected some unique piece of data that marketing wanted, so we had to add it.
The data was stored in a big database that was searchable by any engineer. This was useful for verifying our code was working as intended. I could launch our app, tap through a few screens, and look at my account in the database to make sure my actions were recorded correctly. However, the database hadn’t been designed to compartmentalize access—everyone with any access could view all the information in it. I could just as easily look up the actions of any of our users. I could see their real names and IP addresses, when they logged on and off, what actions they took, and what products they paid for.
Some of the more senior engineers and I knew this was bad security, and we told TechCo management that it should be improved. Test data should be accessible to all engineers, but production user data shouldn’t be. Real names and IP addresses should be stored in a separate secure database; the general database should key off non-identifying user IDs. Data that’s not needed for a specific business purpose shouldn’t be collected at all.
But the marketers preferred the kitchen sink approach, hoovering up all available data. From a functional standpoint, they weren’t being entirely unreasonable, because that extra data allowed them to go back and answer questions about user patterns they hadn’t thought of when we wrote the app. But just because something can be done doesn’t mean it should be done. Our security complaints were ignored, and we eventually stopped complaining.
The app hadn’t been released outside the US when I worked on it. It probably isn’t legal under the European General Data Protection Regulation (also known as GDPR—see Geoff Duncan’s article, “Europe’s General Data Protection Regulation Makes Privacy Global,” 2 May 2018). I presume it will be modified before TechCo releases it in Europe. The app also doesn’t comply with the California Consumer Privacy Act (CCPA), which aims to allow California residents to know what data is being collected and control its use in certain ways. So it may be changing in a big way to accommodate GDPR and CCPA soon.
Privacy Is Baked into the COVID-19 Exposure Notification Proposal
With those two stories in mind, consider the COVID-19 exposure notification technology proposed by Apple and Google. This proposal isn’t about explicit contact tracing: it doesn’t identify you or anyone with whom you came in contact.
(My explanation below is based on published descriptions, such as Glenn Fleishman’s article, “Apple and Google Partner for Privacy-Preserving COVID-19 Contact Tracing and Notification,” 10 April 2020. Apple and Google have continued to tweak elements of the project; read that article’s comments for major updates. Glenn has also received ongoing briefing information from the Apple/Google partnership, and he vetted this retelling.)
The current draft of the proposal has a very Apple privacy-aware feel. Participation in both recording and broadcasting information is opt-in, as is your choice to report if you receive a positive COVID-19 diagnosis. Your phone doesn’t broadcast any personal information about you. Instead, it creates a Bluetooth beacon with a unique ID that can’t be tracked back to you. The ID is derived from a randomly generated diagnosis encryption key generated fresh every 24 hours and stored only on your phone. Even that ID isn’t trackable: it changes every 15 minutes, so it can’t be used by itself to identify your phone. Only the last 14 keys—14 days’ worth—are retained.
Your phone records all identifiers it picks up from other phones in your vicinity, but not the location where it recorded them. The list of Bluetooth IDs you’ve encountered is stored on your phone, not sent to a central server. (Apple and Google confirmed recently that they won’t approve any app that uses this contact-notification system and also records location.)
If you test positive for COVID-19, you then use a public health authority app that can interact with Apple and Google’s framework to report your diagnosis. You will likely have to enter a code or other information to validate the diagnosis. That prevents the apps being used for fake reporting, which would cause unnecessary trouble and undermine confidence in the system.
When the app confirms your diagnosis, it triggers your phone to upload up to the last 14 days of daily encryption keys to the Apple and Google-controlled servers. Fewer keys might be uploaded based on when exposure could have occurred.
If you have the service turned on, your phone constantly downloads any daily diagnosis keys that confirmed people’s devices have posted. Your phone then performs cryptographic operations to see if it can match derived IDs from each key against any Bluetooth identifiers captured during the same period covered by the key. If so, you were in proximity and will receive a notification. (Proximity is a complicated question, because of Bluetooth’s range and how devices far apart might measure as close together.) Even without an app installed, you will receive a message from the smartphone operating system; with an app, you receive more detailed instructions.
At no time does the server know anyone’s name or location, just a set of randomly generated encryption keys. You don’t even get the exact Bluetooth beacons, which might let someone identify you from public spaces. In fact, your phone never sends any data to the server unless you prove to the app that you tested positive for COVID-19. Even if a hacker or overzealous government agency were to take over the server, they couldn’t identify the users. Because your phone dumps all keys over 14 days old, even cracking your phone would reveal little long-term information.
In reality, there would be more than one server, and the process is more complicated. This is a broad outline that shows how Apple and Google are building privacy in from the very beginning to avoid the kinds of mistakes made by TechCo.
Apple claims to respect user privacy, and my experience indicates that’s true. I’m much more willing to trust a system developed by Apple than one created by any other company or government. It’s not that another company or government would be trying to abuse user privacy; it’s just that outside of Apple, too many organizations either lack the understanding of what it means to bake privacy in from the start or have competing interests that undermine efforts to do the right thing.