Why We Should Care about the Consumer Reports MacBook Pro Rating
Like many people, I scratched my head when Consumer Reports issued a “not recommended” evaluation of the 2016 MacBook Pro on 22 December 2016. It wasn’t that the new laptops are above reproach; rather, it was the huge inconsistency in battery life that CR saw across multiple tests of three different units. As a technology journalist, I knew what I’d do in that circumstance: assume it was my fault because of the erratic results in the tests — battery life ranging from 3.75 hours to 16 hours with the same model — and isolate my testing decisions until I found the problem, whether it was mine or Apple’s. I’ve done
this many times, including finding significant bugs in the first 802.11n AirPort routers that Apple later fixed.
The apparent reason for CR’s test result inconsistency was revealed on 11 January 2017. After Apple reviewed the test methodology provided by CR, the firm discovered a bug that affected a setting available only after enabling developer options in Safari. Apple has already released a developer beta of Sierra that fixes the bug, and CR says it will retest the laptops with the bug fix in place. Update: CR performed its tests with the fix in place and now recommends the MacBook Pro.
I found myself aggravated by this situation, though not through a desire to defend Apple nor to denigrate Consumer Reports, despite its checkered history in leading the charge on the non-existent Antennagate issue back with the iPhone 4 (see “Apple Responds to iPhone 4 Antenna Issue,” 16 July 2010).
Rather, it’s the way in which CR failed to serve its readers, rather than how it interacted with Apple, that worries me. (CR is a subscription publication that also makes some material available to the general public at no cost.) By neither revealing its test methods more fully in its original report nor admitting that it should have done more work to exclude its setup as the reason for the results, I worry that CR’s actions reduce the credibility of all technology reporting and reviewing.
Consumer Reports has more prominence on technology reviews now than at any previous time because so few remaining publications have the staff and time to perform rigorous testing. I wrote recently for Fast Company about the difficulty of finding safe and reliable USB-C products because of a lack of extensive independent testing. Because of this, more people are likely to rely on CR’s recommendations, which could lead to poor buying decisions for two reasons:
- Users who might have benefited from purchasing newer gear might have unnecessarily put it off due to CR’s report.
- Because CR found in retesting that the problems were due entirely to this bug, it sent a message to all consumers — not just those who read Consumer Reports — that negative results may just be due to testing errors.
To enable the setting in question, you must open Safari > Preferences > Advanced and select the Show Develop Menu in Menu Bar checkbox. Then, in the Develop menu, you have to choose Disable Caches. CR uses a script to pull a set of 10 Web pages over a local network repeatedly, and disables caching to simulate the effect of a user pulling down fresh pages from many sites. Without caching, the test provides a consistency that isn’t related to network or remote server performance; with caching enabled, Safari wouldn’t use the network or other system resources much at all. The bug apparently revolves around icon caching and thrashes battery life in some cases.
That said, almost no one would ever engage this setting — caches almost always improve performance. Thus, even though some new MacBook Pro owners are complaining about poor battery life relative to previous laptops and advertised performance, no regular user will encounter this particular bug. (Apple disabled the calculation of remaining battery time in macOS following the MacBook Pro release, reportedly due to its predictive accuracy, see “macOS 10.12.2 Sierra Focuses on New MacBook Pros,” 13 December 2016.)
Consumer Reports did most of the things that are upheld as standards at responsible technology publications: it presented Apple with the testing results and gave Apple an opportunity to respond, but didn’t change its conclusion when Apple couldn’t explain the discrepancy. After publishing the results, CR provided additional detailed information to Apple, which analyzed it and determined where the bug in Safari lay.
Since CR retested and found that battery life is comparable to previous models and competing laptops, it’s reasonable to ask what cost came with the first report.
It may have cost Apple sales. No reporter should worry about whether their honest and well-researched test results might affect a company’s sales, but they should always be concerned with rigor and fairness. (A reporter can have qualms, especially when writing about small firms, but those need to be balanced with the readers’ best interests.)
More concerning is the effect that CR’s report could have on the trust consumers place in Consumer Reports in particular and technology publications in general. Without a trusted technology press to verify corporate marketing claims, consumers will be left only with inherently biased sources of information. It’s extremely rare that any tech company actively aims to deceive consumers about the quality of its goods — see the recent debacle Samsung caused by not immediately owning up to Galaxy Note 7 problems — but everyone tries to paint their products in the best possible light. And it’s not hard for that paint job
to verge into whitewashing away unpleasant realities.
So yes, I question Consumer Reports publishing their recommendation without trying harder than they say they did to figure out why their tests provided such wildly inconsistent results. In fact, CR did test briefly with Google Chrome, in a process that wasn’t fully documented in the original article. CR obviously couldn’t recommend the MacBook Pro based on the Chrome test, but they could have used those results to help isolate the Safari problem in their test suite.
The moral of the story is that, when the results of one test are so unusual, the correct thing to do is dig more, not introduce confusion into the world. The reason isn’t to save a company’s feelings or sales, but to keep your bond of trust with your readers.
Note: This article appeared a few hours before CR issued its updated recommendations after retesting, finding consistent and high battery life.
CR lost my trust back in the '90s when they rated Apple computers poorly based on, among other things, their incompatibility with Windows software and systems.
They also slammed the first Miata because it didn't have a back seat or a large trunk! Showed they didn't know the difference between a two-seat roadster sports car and a sedan. So it doesn't surprise me they couldn't tell the difference between computer operating systems.
A good article, Glenn, but in spite of having been a subscriber for most of the last 60 years, I am more critical of Consumer Reports than you were in this text. For thirty years, they have been dumbing down their testing, and especially, their reporting of their testing. In the last ten years or so, there have been several scandals over various aspects of their procedures and ethics.
While I'm not ready to rate Consumer Reports as "not recommended", my testing reveals multiple inconsistencies and errors that are a cause for concern. I hope that someday, they return to their former rigor and care.
I do not always agree with Consumer Reports. I don't think I've ever bought a _bad_ product by following their recommendations, but I have not always thought that their top recommendations were the best choice for me, usually because what they considered most important in testing was not what was most important in my intended use. Still, I'm not sure I think they were at fault in this case.
I admit that I know nothing about standards for writing testing protocols, but I can't quite follow the logic of your criticism. If they performed exactly the same tests multiple times on the same laptops, and got widely varying results, it seems a reasonable conclusion that the problem was with the product--which in fact it was.
The fact that the tests used a setting that most users would never make doesn't really matter; any user _could_ do so.
"The bug apparently prevents caching of icons, which thrashes battery life in some cases." What does "in some cases" mean? That the same laptop downloading the same 10 pages over and over until the battery is dead may achieve a widely differing number of downloads yesterday, today, and tomorrow? I think that indicates a problem with the product, not the test.
And, how else could they have performed the test? "Without caching, the test provides a consistency that isn’t related to network or remote server performance; with caching enabled, Safari wouldn’t use the network or other system resources much at all."
"Consumer Reports did most of the things that are upheld as standards at responsible technology publications: it presented Apple with the testing results and gave Apple an opportunity to respond, but didn’t change its conclusion when Apple couldn’t explain the discrepancy." Why in the world should they have changed their conclusion? They gave Apple the results and a chance to explain them, in advance of publication. What were they supposed to do then? Say, "We asked Apple about this and they don't know what's wrong, so we guess it's okay"?
Yes, it might have been more thorough if Consumer Reports had pursued the problem themselves until they pinned it down, but was that really their responsibility? They discovered erratic behavior which could affect customers; they called it to Apple's attention; they reported on it.
Finally, as to the cost of the report: The first thing you read now in the online copy of the test report is "(Update: Consumer Reports tested the new MacBook Pros after Apple issued a fix in response to our initial battery test results. We now recommend the laptops.)" Obviously, nothing can be done about the existing printed copies, but I will be extremely surprised if this update does not also appear in the first magazine issue to which it can be added.
I always particularly look for the Glenn Fleishman byline on TidBITS articles, and I will continue to do so, even though now, as with Consumer Reports, I have to say that I don't always agree with you.
They didn’t test using typical user settings, which is on them to figure out why there was a variance. If you create a test environment that produces inconsistent results, something’s wrong if, changing one variable, you get consistent results (trying Chrome, in their case). Given that no other testing they had done of Apple or other laptops produced such a result, they should have backed out their test setup to figure out if it was their fault. It wasn’t, but they also weren’t testing a typical user setup, so what was wrong wouldn’t affect any user except those who had forgotten they had changed an obscure setting used for testing.
The preferred way to test network features is to set up a server that does what you want, not modify the stock hardware. (You can configure a server to prevent a Web browser from caching very easily.)
I think you’re misreading a few parts of the article, especially about changing conclusions.
I'm sorry, but after reading your initial article here, and your reply above, I cannot agree with your assessment of this situation.
Nobody complained about CR when they recommended Apple products with this test protocol. They stressed the machine and encountered variation (the real problem here) above what was expected from this test. Switching to Chrome reduced the variation, indicating something with Safari, the default Apple browser.
It is not CR's job to diagnose the problem, just to report their findings. They didn't recommend the MacBook Pro, and now everybody is shooting the messenger.
They share their findings with Apple, which provided a bug fix, the variation is back to within expected results of their test, and they're back recommending the Mac.
Thank you CR for helping to find this bug. Now, how much it helps users in real-life remains to be seen, but I think they did what they should be doing - report what they see based on testing they can control.
Amen. At some point another test program without using unusual computer settings should have been run to see if CR's test results were within a reasonable range. Poor methodology for a previously trusted consumer organization. Now skeptical about relying on them.
Well stated Glenn. What's even worse is saying they now recommended it because of Apple's forthcoming fix is total blameshifting and gaslighting. Turning the cache off would never be done by end users, rarely by developers, and shouldn't be done in testing (for those reasons).
I admit right off the bat I'm clueless about CR. That said, when it comes to this MBP battery story I am quite a bit more critical of Apple, actually.
1.) CR did perform the test using a Safari feature. Sure it might be a dev feature, but it's provided by Apple and for the type of test CR was performing it did indeed make sense to not rely on caching.
2.) CR did communicate openly about what they found and how they tested. The also involved Apple early from what I can tell.
3.) Because such inconsistency in testing is indeed worrisome (possibly more so than simply somewhat limited battery life alone), I believe it did made sense to report details, involve Apple, and inform consumers. The fact that they were apprently puzzled and therefore informed and involved more parties to me indicates they were truly interested in what's going on. Openly communicating not just results, but also your open questions to the very end is something that in my book also suggests seriousness. [As a physicist in research that is the way we are usually expected to handle things, so I admit I have a personal bias here.]
4.) Apple shipped buggy software that caused this "battery issue".
5.) This type of bug is typical Apple as of these days. It's related to a more subtle feature (a dev feature, i.e. something that might be considered 'pro') rather than something flashy that marketing can make a big hoopla over. Safari dev settings are just not sexy. So why bother?
6.) Their lack of attention to detail (something they once absolutely killed at) is exactly what leads to bugs like this one making their way to production software.
7.) When you sell a computer to customers for $2400 the product has to be considered solid and ready. You don't take $2400 and then involve your paying customers into some kind of involuntarily beta test where ultimately an independent consumer organization then has to do your software debugging for you. Just consider for a moment how ridiculous this is. Apple is sitting on something like $250B and employees probably thousands of skilled engineers, yet in the end they have essentially let a magazine test their hardware/software integration on their latest and greatest new product!
8.) I think CR should have held off on issuing a recommendation until the bug fix provided by Apple is rolled into a macOS version that is shipping to customers. Trying out the fix in a beta update is fine for testing purposes, but the recommendation to the public needs to be based on what's shipping, not what might ship and then might work.
9.) Everybody makes mistakes. It's a good thing this turned out to be a bug that can apparently be resolved easily and Apple appears willing to soon release a public fix. I'm happy CR found the bug and glad their tests will hopefully lead to getting a fix to customers soon.
10.) And finally, the battery life CR now reports consistently is indeed very good. I hope this translates to everyday use for most people. I know one thing I always loved about my 2013 13" MBP was its awesome battery life despite the Core i7, 16 GB RAM, and its 500 GB SSD (maxed out at the time).
4, 5, 6: No software is 100% perfect. The 100s of man hours spent investigating & fixing this obscure bug in a barely used developer option could instead have been used to actually benefit real-world users.
7 & 10: how many customers turn on the Develop menu and then disable caching?
1. When the test produced unexpected results, they should have gone back to a stock configuration and re-tested (not for recommendation but for evaluation). You can prevent caching on the server side, which is typically done to isolate browser issues.
2. They didn't reveal their methodology fully until the second post after Apple had discovered the bug. The initial report says they informed Apple before publishing, not that they did so "early."
5. The feature used is for debugging and testing, not for normal use. It's absolutely Apple's bug, but it's never something that would be enabled casually, even by a developer.
i appreciate the question whether CR lost some cred here, but when CR made its first announcement, i was already recommending people wait to buy until the already known battery life issue was resolved. (removing the battery estimate from ALL notebooks to cover a problem w/ the new ones was pretty dodgy.)
also i guess in CR's defense: i had an early iphone 4. i had it drop calls, until i put it in a case. when the "non-existent" problem was fixed, i was glad to be able to give an older relative a 4S w/o reservation. it burns me that apple PR won the antennagate fight so decisively: "you're holding it wrong" was blaming the user for a design problem.
I agree: the battery life issue remains concerning because of reports. However, the basis on which the recommendation was made and the follow-up doesn't reassure me that they would examine their own testing better in the future, thus harming the credibility of all reviewers.
The Antennagate issue was "resolved" in that nearly all phones at the time had the problem — remember that "how to hold your phone" was in the manual for competing phones — and it didn't affect people in a substantive way that led them to return the phone in greater numbers than previous models.
obviously it was a great phone—i used mine more than 3 years, it's actually still doing offline things around the house, i have no idea what might kill it—but the long fury at CR (& not at other outlets that reported same, including new york times) raises trust questions about apple's press corps. apple is a giant company, intensely secretive, intensely *manipulative*, with tens of billions to spare on PR; consumer reports is not.
Hmmm... You would have a stronger point if CR was the only place reporting battery issues.
And a methodology that has consistently awarded Apple and MBP recommendations in the past, regardless of "developer menu" blah, blah, blah, really does say the issue is with Apple, not CR. CR was consistent. MBP was not.
But you miss my point, then: it isn't that the battery life was low. It was that it was inconsistent and didn't match results in other testing from other publications. Yes, there are absolutely stories (which I've read, too) from readers seeing unacceptably poor battery life. (In fact, see a comment in this thread.)
However, CR's methodology didn't produce a consistent result, and it rests on them to understand why their testing setup produced a variance. Had they done what I and other reviewers typically do to isolate causes, they would have found their setup was not at fault.
By more or less shrugging, releasing a "not recommended" rating, then revising that without any seeming self-reflection about their process, it doesn't serve readers well in providing a rigorous, trustworthy assessment.
The point isn't that the outcome would have been different. Rather, that readers would have understood that it was clearly not the testing process that was at fault. This would have further enhanced CR's credibility and conclusions.
I don't think this one anomaly should have caused CR reflection on their methodology. I thin (and you're the first person I've read report this, so good on you) that they reached out to Apple before publishing because it was such an anomaly. So they did not _just_ shrug.
So, again, the inconsistency was on Apple and MBP's part, not CR's. They were certainly resting on a methodology that has proven more than adequate for many years with no reason to think this time should be any different.
What I am curious about is if the bug is actually an effect or symptom of the actual cause of battery issues many people are having. The Safari Developer bug is just the unexpected way for it to have been exposed. It may not be that this was only CR methodology specific.
So, in a different way, I do agree that we should care about the report.
it was clearly explained the bug was draining the battery, not exposing a flaw in batteries. there is no secret here.
Don't know about the Consumer Reports testing issue. WHat I do know is that I returned not one but TWO MacBook Pro 2016's because i could not get more than 3 1/2 hours of battery life from either. As a person on airplanes nearly every week that was unacceptable. Instead I keep using my MacBook Pro 2015 which provides nearly triple the battery power I experienced. I've owned about 20 MacBook Pro's over the past 20+ years so I love them and am not an Apple basher. Something else must be a factor.
I am so glad you did this on TidBits. OK I really don't care what CR says, and as you say the problem has already been fixed. But I think there is another reason why you should not buy the Touch Bar Macbook Pro or the MacBook. These units are not repairable or upgradable. Not by the use or even an authorized Mac Repair shop. The 13 inch (let's start with this, but I think an actual power user may want the 15 inch or use this with an external display) with 500 GB of storage 16 GB of ram and AppleCare it is $2449.00. If after AppleCare expires something goes wrong say the storage unit fails (not likely but it could happen) the only repair is a complete logic board replacement. In speaking to one of the techs in my local repair shop, they estimate at that time it would be between $800.00 and $1000.00. Really not fair. I think Apple should increase AppleCare coverage to 6 years for units which require new LB to repair.
Like most "seasoned" professionals, I reserve my purchasing till after others have gone through a bottle of advil or tylenol.
My first reaction when I read the CR not recommending was, See! Apple! You Are Fallible! Then I drank my last sip of latté and thought, well, if CR found this, there has to be something going on. Apple tends to throw legal anchors at folks.
Then when I read that Apple was going to discuss this with CR (I hate marketing speak and leveraging...aka strongarming) and even saw the AAPL had dropped (a wee bit) after CR's putting the MBP on notice, I was suddenly skeptical. My impression (sans all the facts), along with the last few years of Apple's QC (don't get me started on them discontinuing Airport routers) isn't good.
This isn't so much about CR (they make mistakes) but that it sounded like Apple wanted to backpedal/discredit them (sales are not up) and Apple's focus on iOS rather than MacOS clearly reflects "push it out the door, fix later" mentality.
Excellent points. At least 5 times, I've written to CR regarding their testing and/or recommendations. Most recently was an article comparing data-transfer "pipes" - eSATA, USB, USB3 and and ethernet. To my utter astonishment, they said that gigabit ethernet was the "fastest of all."
Not in my universe. (I suspect that they had someone just read the specs, and not understanding the difference between bits and bytes; between theorectial and practical, looked at "gigabit" and concluded that it must be fastest because the number was biggest.
I too have subscribed for over 60 years, and they are worth the money, IMHO. But their technology "experts" are seemingly all wannabe's.
I lost confidence in CR several years ago, and finally cancelled my subscription. They seem to have a new purpose, not just reviewing products for consumers, but lots and lots of "green" stuff and other things. Just not something I will pay to read anymore.
I used to disable order prevent trackers from storingcookers there. I stopped when I found out that the disabling was not consistent, someetomes lasting das and sometimes minutes. does that mean the problem is in my testing or in the software?
The only reason CR issued a new recommendation was because they were caught using a set up that 99.44% of users don't use. I have never seen them recommend an Apple product over a non-Apple product.
Back to the basics of research and data analysis. If the data does not tend toward the mean something is wrong.
CR failed this basic analysis factor. Someone's head should roll for not understanding this basic concept.
You must follow the data and understand what it is telling you.
Sounds like someone found anomalies and jumped up and down saying hooray we can slam Apple again.
Specific to CR, the organization. I believe that there has been a somewhat recent change in the direction of CR that may or may not affect product testing. As a former subscriber to CR, I received an email from them of a purely political nature promoting their preferences. They happened not to agree with mine. I then cancelled my subscription for the reason that I wanted only product analysis from them, not political opinion. My point is that if they are branching out into politics, their product analysis staff may be being spread a bit thin to maintain the quality of testing they once had. So, in closing I must say that this lack of quality now apparent in their testing does not exactly surprise me. I would imagine that it is across the board and not limited to Apple products.
It doesn't hurt to take CR with a bit of a grain of salt when it comes to technology recommendations. For years when OS X was very new they suggested that the Mac had an equal problem with malware compared to Windows. For part of the time they maintained this opinion, there was no malware **at all** for OS X. It was (as a whole) brand new. It took a few years for the Trojans to show up and a few more for worms. And every single OS X malware that did show up was a big news story for years.
Windows has millions of unique pieces of malware. macOS at this point has perhaps hundreds. Nothing is immune from malware, not even the Mac. But to say it has parity with Windows is silly. That's not true even now. They finally got someone in that knew something about security.
Mac user since 1984. CR overemphasized battry life. However, that does not justify or have a thing to do with the gross connectivity failures of the new Macbook Pro. The new MB Pro may be amusing to overly funded children or hobbiests, but it is a setback to PROfessional users as in macbook PRO. Perhaps next year...