How Spotlight Suggestions Handles Privacy
Back in 2005, Apple added Spotlight to Mac OS X 10.4 for finding files. Rather than merely searching through filenames, Spotlight could also find files based on metadata, and it could even look inside common file types, like PDF. Although often a source of frustration (thanks to spotty results and performance-eating indexing), Spotlight has improved with every version of OS X to provide better results with fewer performance hits. iOS even gained a simplified version of Spotlight, capable of finding apps, contacts, events, and email messages.
New in iOS 8 and OS X 10.10 Yosemite is Spotlight Suggestions, a new feature that extends Spotlight to the Internet, enabling it to search common information sources like Wikipedia and even the Web at large, via Microsoft’s Bing search engine. Welcome as this is for providing a single search interface for finding whatever you need, it’s not new — independent search applications have offered similar features for years.
However, building such a feature into iOS and OS X opens it up to added scrutiny, and on 20 October 2014, the Washington Post reported that Spotlight Suggestions was a major privacy concern, leaking your information to Apple and potentially other sources. Unfortunately, the article contained multiple factual errors, many of which could have been avoided had the authors read Apple’s iOS 8 security documentation and Apple’s privacy page, which
includes a section on Spotlight Suggestions.
Semi-Anonymous Search — To save you the time of reading those documents, and to add some details provided by other Apple sources, here’s how Spotlight Suggestions works, and how it manages your privacy.
The primary concern most people likely have is the possibility of a local file search revealing private information to Apple, but various privacy mechanisms prevent that.
When you open Spotlight and start typing a search string, Spotlight uses predictive search to start generating results. This is the same technique used by nearly every modern Web browser and search engine. Spotlight simultaneously searches its index of your local files while checking with multiple Internet sources via a connection to Apple’s own servers. The requests go to Apple first, not directly to Internet sources like Wikipedia.
To manage your session, Apple uses a one-time session ID that lasts for 15 minutes. Neither the session ID nor the search query contain your IP address or any other device identifier. Session IDs also aren’t coordinated or correlated, so there is no way for Apple to track historical usage by chaining session IDs together. In short, your query exists within a 15-minute bubble that isn’t tied to you directly. This is different from Siri, which uses a more persistent device identifier since it requires more context over time (due in large part to the overhead of voice recognition).
Queries do include location information, but Apple added a “fuzzing” feature to mask your exact location. The degree of fuzzing varies based on the density of the area you are in. In a city, it will likely be relatively precise, down to the block, in order to direct you to the closest coffee shop (really, what else matters?). In a suburban or rural area, it might be no more specific than the town. Fuzzing happens on your device, not Apple’s servers, so they never see your exact location.
You can disable Spotlight Suggestions location tracking on your Mac in System Preferences > Security & Privacy > Location Services > System Services. Click the Details button, then deselect Spotlight Suggestions. Even if you do this, your IP address will still be used for higher-level location tracking, since Apple needs to know at least what country you are in. In iOS 8 the setting is under Settings > Privacy > Location Services > System Services (scroll all the way to the bottom to find it).
It’s important to remember that Spotlight Suggestions and Siri use different search mechanisms, with different privacy settings.
To provide Bing search engine results in the Spotlight window, Apple keeps track of “common queries.” Apple does not pass every search query to Bing, merely those identified as common. For example, I search for “US Airways” a lot due to constant work travel. That query pops up the airline’s Web site (and Wikipedia entry) since it’s relatively common. But when I search for the title of a Keynote presentation that I need to edit on my upcoming flight, I don’t get any Web results, even though I would if I searched for it directly in Bing (since I’ve used that presentation at conferences and in blog posts).
When these queries are sent to Bing, they come from Apple’s servers, not your computer, so Bing can’t track them. If location is sent (e.g. you are performing a search for local movies) it is provided to Bing only at the level of the city you are in, not even the fuzzed location Apple uses. Lastly, Apple’s contract with Microsoft prevents Microsoft from retaining queries and results.
Keeping It Local — Apple doesn’t view or share your local device index, but the queries themselves will always hit their servers. Apple does track what you select in Spotlight search results. If it’s a local file, Apple tracks only the file type, not the filename or other specifics (the company may track a little more; we don’t have an exact list, but names, contents, and other personalized metadata are explicitly excluded). This is still tracked only to the session ID level.
If you don’t like any of this, you can disable Spotlight Suggestions completely via System Preferences > Spotlight in Yosemite, or Settings > Spotlight Search in iOS 8. Spotlight searches will then be limited to the contents of your local device’s index.
Apple doesn’t always get security and privacy right as it continues to tighten the links between Apple devices, software, and services, as I recently highlighted in “You Are Apple’s Greatest Security Challenge” (14 October 2014). But aside from Tim Cook’s privacy message, it’s clear that Apple not only sees privacy as a competitive advantage, but is doubling down on the engineering to support it.
As a security analyst, I worry constantly about becoming biased, especially with a company like Apple whose products are so deep a part of my life. To avoid this, I spend a tremendous amount of time researching and validating my findings before publishing them. While this may be pie-in-the-sky thinking, I believe journalists and publications should make similar efforts to avoid bias, and tamp down the desire for explosive headlines that leads to inaccurate reporting, particularly when such articles increase paranoia unnecessarily.
While Apple has made, and continues to make, security and privacy mistakes worthy of criticism, the original Washington Post story in this particular case was not only factually wrong, but incorrect in ways even basic research would have revealed. On the upside, we all now have a better understanding of how Spotlight Suggestions works, and it’s certainly important to continually evaluate how — and if! — Apple is keeping its privacy promises.
Thanks for the details Mr Mogull! How does Launchbar work by comparison? Totally local indexing or is any of the searching get sent to the developer?
Thanks for the clear explanation. I had initially turned the Suggestions and Bing Search off due to the "creep factor", but now I feel comfortable enough to give them a try.
This is why you should never turn to publications like the Washington Post for security or technical information. I initially turned off the Bing searches because of my personal bias against Bing and all other evil/inaccurate things. But I find the Wikipedia suggestions (along with the other resources presented) to be a wonderful convenience.
We should especially avoid the Washington Post for anything and everything Apple now that Jeff Bezos owns it!
Thanks of the info!
I would like to know the difference in Safari's cookie handling, between "Allow from current website only" and "Allow from websites I visit".
Also, what exactly "Send crash data with app developers" does? And what's your stance on "Send diagnostics & usage data to Apple".
John Siracusa mentions it in his epic (as usual) OS X Yosemite review:
"[Update: An earlier version of this review stated that Apple had softened its stance on third-party cookies in Safari 8. This is not the case. The default “Allow from websites I visit” setting in Safari 8 behaves the same as the default setting in Safari 7. It’s just differently worded. We apologize for the error.]
Safari’s new cookie-handling setting, “Allow from current website only,” will only accept cookies whose domains match the sending website. This is somewhere between the default setting (“Allow from websites I visit”) and completely blocking all cookies."
You state that location fuzzing is done on the device, and that the degree of fuzzing is based on the density of the location. So iOS8 knows how dense its surroundings are at all times?
That would be pretty easy to do based on IDs of nearby cell towers, wouldn't it?
There are a ton of geolocation services based on IP address, or they can pull it through wifi signals (like early iPhones did before GPas).
Real journalism FTW.
I would have liked to learn more about the actual fuzzing etc. And it still remains a fact that with Yosemite, Spotlight sends local and therefore sometimes very private data to Apple while Apple did not bother to ask for my opt-in. And since Apple is an American company, that is yet another option for surveillance, hacking etc. by the NSA and other secret services.
On the other hand, Spotlight lost abilities in Yosemite. Sparse bundles for example are no longer findable with Spotlight … a direct option to jump to the Spotlight option from the Spotlight window is gone too.
This is merely a reprint of Apple's marketing documentation, and takes as fact their well disproved claims that they cannot track users individually.
So, your claim should be really easy to prove, right?
So the Washington Post did some actual testing, and the people it quotes as a source are analysing the network traffic to see what is really being transferred, and transparently documenting their efforts on GitHub.
In contrast, this piece mentions a contractual agreement with Microsoft (ha) and 15-minute session window which will both protect us, with no discussion of whether that's realistic. No first-hand research at all. And this is somehow "real journalism" (comment above)?! No, this is a typical article that people with strong cognitive dissonance will share on social media.
I don't know where the Microsoft agreement came from, but the 15-minute anonymous identifier is listed in Apple's security guideline document. Read it before you post, and don't behave like a troll.
The Microsoft agreement is mentioned on the privacy site I be, and was also highlighted to me by Apple sources.
You wrote "Neither the session ID nor the search query contain your IP address or any other device identifier". Now that's just pure horse radish - to communicate on the internet, the other party has your IP address. It's sad that you lie to our audience. It's even more sad that Gruber linked to this article as an example of "fair and balanced" reporting.
The IP is not part of the message body. But yes, is always part of the header. However, Apple doesn't record this. If they did they would be violating their public policies and be subject to FTC action, which has happened to other internet companies.
If you've built services like this you know you get to choose what to log. I doubt Apple's lawyers would let them violate a public statement and risk legal action.