MacWhisper 8 Improvements

MacWhisper has been updated to version 8 with some new features, including a video player. Multiple apps use the Whisper model to perform transcription. I bought a license for MacWhisper early, and I’ve been using it a lot ever since.

MacWhisper application icon featuring a close-up of a white microphone in vertical orientation, on a stand, against a blue gradient background in the shape of a round square.

One example: We use a Notion database to manage all the MacSparky content (this blog, the MacSparky Labs and Field Guides, etc.). With the addition of Notion AI, we’ve found value in keeping text transcripts of released content in the database. This allows us to ask questions like, “When is the last time I covered MacWhisper?”

MacWhisper 8 adds new features:

Video Player

A new inline video player has been added that allows transcribing video files. The video player can be popped out into its own window. Subtitles display directly on the video, and translations appear as separate subtitles, too. This will make the above Notion workflow even easier

WhisperKit Support

You can now choose different Whisper engines like WhisperKit for your transcriptions. WhisperKit offers distilled models for faster transcription speed, and transcriptions stream in real-time. WhisperKit can be enabled in Settings → Advanced.

There are a bunch of other improvements keeping MacWhisper at the top of my list for transcribing audio on my Mac.

I will be curious to see if Apple incorporates the Whisper technology into the Mac operating system at WWDC. It seems like it should be built into the operating system. Moreover, if they incorporated it onto the chip, it could really scream. But it’s too early to tell exactly what Apple’s vision is for incorporating AI into macOS, and this may be a bridge too far. In the meantime, I’m very happy to have MacWhisper around.

Transcripts in Apple Podcasts

With the iOS 17.4 update, the Podcasts app from Apple now has the ability to create transcripts of podcasts. This is great news. For years, people have asked me to add transcripts to the Mac Power Users and my other shows, but the problem has always been that it is cost prohibitive. With the explosion of artificial intelligence over the last year or two, that is no longer the case. And not only that, it’s built-in to the app, so we don’t even need to produce it ourselves.

iPad in landscape mode showing the Podcasts app from Apple. The episode shown is from the Mac Power Users podcast, entitled “I Got to Be the Hero.” You can see the artwork and the play controls on the left, and the new live transcription feature on the right, with some text highlighted at the top.

A couple nice features is that the transcript is searchable and tapping on an area of the transcript jumps the audio to that point.

This is a really nice update to Podcasts. Is it going to be enough to pull me away from Overcast? Probably not. But I’m at least going to take a serious look.

Apple Licensing Data for its AI Training

The New York Times reports Apple is in negotiations to license published materials for training their generative AI model. This shouldn’t be a surprise. A few years ago, when image processing was the big thing, everyone thought Apple would fall behind because they weren’t collecting all our images for data processing. Then I saw Craig Federighi explain how Apple could get pictures of mountains and that they didn’t need mine.

This is similar to how Machine Learning requires a data set to train. Again, Apple is looking to buy data as opposed to setting its AI loose on the Internet. I really wish I had a better idea about what Apple is thinking to do with AI.

A Different Take on Apple and AI

William Gallagher is a pretty clever guy, and I enjoyed his take on Apple and AI over at AppleInsider. Based on Apple’s latest paper, they seem (unsurprisingly) interested in looking for ways to run Large Language Models (LLMs) on memory-constrained local devices. In other words, AI without the cloud. We saw this a few years ago with image processing. Apple wants to have the tools while preserving user privacy. Just from speaking to Labs members in privacy-conscious businesses, I expect this will be very popular if it works.

Sam Altman’s Return to OpenAI

It was quite the week over at the OpenAI Office. I’m sure someone will write a book about it at some point. From the outside, it looked like another example of the conflicting priorities that always result when a nonprofit owns a for-profit company. Regardless, those priorities got sorted out this week.

My only other comment on this is the irony that OpenAI is the company making the thing that many fear will replace their jobs. Yet, when push came to shove, OpenAI’s biggest concern was keeping their humans, not their robots.

Is AI Apple’s Siri Moonshot?

The Information has an article by Wayne Ma reporting Apple is spending “millions of dollars a day” on Artificial Intelligence initiatives. The article is pay-walled, but The Verge summarizes it nicely.

Apple has multiple teams working on different AI initiatives throughout the company, including Large Language Models (LLMs), image generation, and multi-modal AI, which can recognize and produce “images or video as well as text”.

The Information article reports Apple’s Ajax GPT was trained on more than 200 billion parameters and is more potent than GPT 3.5.

I have a few points on this.

First, this should be no surprise.

I’m sure folks will start writing about how Apple is now desperately playing catch-up. However, I’ve seen no evidence that Apple got caught with its pants down on AI. They’ve been working on Artificial Intelligence for years. Apple’s head of AI, John Giannandrea, came from Google, and he’s been with Apple for years. You’d think that people would know by now that just because Apple doesn’t talk about things doesn’t mean they are not working on things.

Second, this should dovetail into Siri and Apple Automation.

If I were driving at Apple, I’d make the Siri, Shortcuts and AI teams all share the same workspace in Apple Park. Thus far, AI has been smoke and mirrors for most people. If Apple could implement it in a way that directly impacts our lives, people will notice.

Shortcuts with its Actions give them an easy way to pull this off. Example: You leave 20 minutes late for work. When you connect to CarPlay, Siri asks, “I see you are running late for work. Do you want me to text Tom?” That seems doable with an AI and Shortcuts. The trick would be for it to self-generate. It shouldn’t require me to already have a “I’m running late” shortcut. It should make it dynamically as needed. As reported by 9to5Mac, Apple wants to incorporate language models to generate automated tasks.

Similarly, this technology could result in a massive improvement to Siri if done right. Back in reality, however, Siri still fumbles simple requests routinely. There hasn’t been the kind of improvement that users (myself included) want. Could it be that all this behind-the-scenes AI research is Apple’s ultimate answer on improving Siri? I sure hope so.

My Transcription Workflow for the Obsidian Field Guide (MacSparky Labs)

In this video I demonstrate how I used two AI tools, MacWhisper and ChatGPT, to generate transcripts and SubRip text (SRT) files for the Obsidian Field Guide videos.…

This is a post for MacSparky Labs Level 3 (Early Access) and Level 2 (Backstage) Members only. Care to join? Or perhaps do you need to sign in?