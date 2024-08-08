Share Email



I have a project for someone out there! As I have admitted numerous times over the years, my development skills are weak at best, but I’ve come up with an idea for a tool that would benefit many Apple admins and consultants and, by extension, the rest of us.

Apple maintains an extensive knowledge base of thousands of pages of support articles that document many technical aspects of the company’s operating systems, apps, and devices. At TidBITS, we regularly link to these articles, and, thanks to a suggestion from reader Jolin Warren, my cleanup macro now trims the URLs so they should load in the default Apple Support site for your country.

Some time ago, I realized that Apple Support URLs follow numeric patterns. For instance, here are the URLs for Apple’s most recent security update release notes:

iOS 16.7.9 and iPadOS 16.7.9: https://support.apple.com/en-us/HT214116

iOS 17.6 and iPadOS 17.6: https://support.apple.com/en-us/HT214117

macOS 12.7.6: https://support.apple.com/en-us/HT214118

macOS 14.6: https://support.apple.com/en-us/HT214119

macOS 13.6.8: https://support.apple.com/en-us/HT214120

Safari 17.6: https://support.apple.com/en-us/HT214121

tvOS 17.6: https://support.apple.com/en-us/HT214122

visionOS 1.3: https://support.apple.com/en-us/HT214123

watchOS 10.6: https://support.apple.com/en-us/HT214124

As you can see, the six-digit ID number after “HT” increments by one for each release, but the ID is assigned randomly to the releases. By creating more of these URLs by hand and guessing at some numbers, I determined that Apple has used other six-digit ranges over time. Many recent URLs start with 213 and 214, but I’ve also found URLs starting with 100–119 and 201–212. I haven’t discovered any pattern behind the ranges, and Apple skips some IDs for unknown reasons.

Leveraging This Realization

When I first figured out what Apple was doing, I considered using Dejal’s Web monitoring app Simon to tell me if any of these pages had changed. I didn’t get very far down that path because I couldn’t see any way to feed Simon thousands of URLs in a programmatic fashion, and it seemed like it might overload my Mac to check regularly. Other Web monitoring tools had the same problem—they’re designed to watch a handful of pages, not thousands.

Next, I created a Google Sheet that had a column for the six-digit ID, a column that appended each ID to the URL root, and a column that used this formula — =Hyperlink($A2, IMPORTXML($A2,"//title")) — to look up and bring in a hyperlinked title of the resulting page. Not all IDs map to active pages, so some cells were filled with #N/A.

Success? The problem is that when I fill the rows down, Google Sheets gives up at some point because there are too many outgoing calls to Apple’s support site. Then all I see is “Loading…” even though clicking one of the URLs shows the traditional preview.

Shortly after this, I got a press release from a company called Neptyne, which was releasing an add-on for Google Sheets that would enable a programmer to interact with data in Google Sheets using Python. I don’t know Python, but when I explained my goal to the Neptyne founder, Douwe Osinga, he took a swing at what I wanted. Because it was in Python, he could loop in such a way as to avoid causing Google Sheets to freak out. Plus, Douwe was able to extract the date and version from Apple’s pages, which allowed me to sort by date so I could see which pages had changed most recently. (I never figured out Apple’s version numbering scheme.)

As much as this solution worked better initially, it was brittle. Adding more rows caused the whole thing to stop working, and I don’t know Python well enough to troubleshoot it, even with the aid of an AI chatbot.

But it suggested that what I wanted was possible. Imagine a world where you could learn about Apple’s technical changes as soon as they’re published, rather than having to stumble on them through a search later.

Apple Support Article Tracker

Being able to iterate through the universe of Apple support article URLs, retrieve the content, and sort the list by date was a good proof of concept. I don’t think Google Sheets is the right platform for this, and my research suggests that neither Excel nor Numbers are contenders either. Instead, I suspect we need a database with a Web-based front end for anyone to use. In my ideal world, the Web scraping would operate roughly like this:

Traverse all possible Apple support URLs regularly, perhaps once per day. (Because it would act like a spider, it would need to honor robots.txt exclusions, throttle itself, and behave nicely.)

On the first pass, load each page into a database, populating fields for title, date, version, and full text.

On subsequent passes, save the content locally if it has changed from the previous save.

With the database storing the metadata, full text, and versions, the public website would need to:

Display a list of all Apple support article titles, sorted by date, with access to previous versions.

For any selected article and version, display a pane showing the rendered HTML.

Provide a view that shows the differences between any two versions of an article.

Offer alerts for new and changed articles, perhaps via RSS or email.

Allow full-text searching. (Apple’s search engine is notably weak—see “ Apple Launches Documentation Site for Manuals, Specs, and Downloads ,” 25 March 2024.)

For extra credit, provide a per-article discussion topic for annotations.

For double extra credit, create an AI chatbot to allow conversations with the knowledge base, with answers referencing source pages.

I’d like to believe this wouldn’t be too difficult for someone with decent Web development chops, but it’s well outside my skill set. (Though he wasn’t volunteering to create it, Glenn Fleishman suggested that a Wiki site might be an easy way to store, diff, and present the data, plus provide options for community annotation.) I’m sharing the idea in the hope that someone finds it a compelling challenge and actually builds it. I’m happy to collaborate on the process and help with hosting if necessary. Are you game? Are there other features you’d like to see in such a tool?