Adam Engst 11 March 2024 11 comments

Text-to-Podcast Service Listen Later Sponsoring TidBITS

We’d like to welcome as our latest sponsor Listen Later, an Internet service that converts the text of any article on the Web into a custom podcast episode. You simply email the URL of the article you want to convert to Listen Later, and shortly after that, it shows up in your personal Listen Later feed in your favorite podcast app.

Created by indie developer Yalim Gerger, who hails from Istanbul, Listen Later is a product of the modern age, relying on AI tools to process and narrate article text. Text cleanup is essential because many articles contain ads, image descriptions, disclaimers, and other addenda that get in the way of smoothly spoken audio. For instance, Listen Later is smart enough to avoid the comments at the end of TidBITS articles. And then there’s the narration. We’ve come a long way from the speech synthesis days of Apple’s Fred voice, and voice quality is shockingly good.

Setup is simple—after you create an account, Listen Later’s main URL displays your personal podcast URL, a field for the address it should associate with your feed, the email address to send to, and your available credit. Then it’s just a matter of sending an email message—no Subject is necessary—to the address provided with one or more URLs in the body of the message.

I first tested Listen Later on a 6700-word article by the author Patricia Lockwood, “When I Met the Pope.” In retrospect, this may not have been the best sample text because Lockwood’s prose flits descriptively from branch to glorious branch:

At the reception – no, excuse me, the ‘vin d’honneur in the Lapidary Gallery’ – they are serving wine from Sting’s vineyard. ‘What did you say?’ I yell at Hope, who is flying on a single glass, and she points to a brochure from which Sting stares out with a look of intense fermentation, Trudie’s arms wrapped around his chest from behind, both of them wearing the honey of great good health and presumably fresh from a seven-hour stomping. I guess the idea is that you meet the pope and then get a mouthful of Sting’s grapes. I nod to the brochure and Hope nods back conspiratorially, slipping it into her bag. That is how it is done; we are doing it. The Vermentino, yes, is called ‘Message in a Bottle’. So is the Sangiovese.

I had to go back and read chunks of the original piece to realize that Listen Later’s AI narration had gotten everything right. Or almost right. Across a 36-minute audio version of the article, I noticed a couple of errors, mostly homographs like “live” that have multiple pronunciations, although the narration also cut the word “Sangiovese” short, eliminating its final vowel sound. I also thought it biffed “Jesus,” but when I checked the text, I discovered Lockwood intentionally wrote the word as “Jaysus.” In subsequent tests, Listen Later’s phrasing continued to impress but occasionally stumbled on abbreviations and numbers, notably on prices like $2599, possibly because of our house style of omitting commas in four-digit prices for brevity.

I don’t know Patricia Lockwood, but the Listen Later voice on that first listen could have been hers, for all I knew. It wasn’t, I soon discovered, but when I queried Gerger about it, he quickly added a setting that allows you to choose from six high-quality voices. My other early criticism of the audio quality was that the voice spoke just a hair faster than I would have liked. Initially, Gerger didn’t think he could do anything about that, but a day later, he wrote back, saying that he had found and added a voice speed setting. In the meantime, I had discovered the speed problem was my fault. I listen to podcasts in Marco Arment’s Overcast, which has a Smart Speed option that shortens silences. It usually works well with people, who pause to breathe, but the AI narration was so smooth that removing silences made it sound slightly breathless.

As you’ll note in the screenshot above, Listen Later can even translate text into whatever language you prefer, although that costs extra. That’s a good segue to talk about pricing. Listen Later has to pay for the API calls to process and narrate text, along with additional translation. Those fees directly relate to how much text you ask Listen Later to process. Your average article will probably end up somewhere between $0.30 and $1.50, and Listen Later’s pricing page provides an estimator so you can see how much any article will likely cost.

Listen Later relies on a pay-as-you-go model rather than a subscription. Every new account gets $2 in free credits to trial the service, and after that, you can top up your account whenever you want. Or, if you don’t want to think about it, you can set Listen Later to auto-refill by charging your credit card $5 whenever your balance falls below $0.10. This model seems fair and transparent, and those who decry subscriptions should welcome it.

One thing to remember is that Listen Later, as befitting its name, isn’t instant. The article’s length plays a role in how long conversion takes, as does the load on the OpenAI servers, which is higher during US business hours. In the case of the lengthy Patricia Lockwood article, Gerger told me that it took about 20 minutes for the conversion to complete, and there were instances where his application waited up to 5 minutes for a response from OpenAI. But this shouldn’t generally be a problem—the entire point of Listen Later is that you’re queuing an article up for listening to in the car on the way home, for instance, or during your next workout. Listen Later does include the URL to the original article in the podcast episode description for easy reference.

I discovered that you can even convert PDFs to audio—I tested that with the bonus chapter that Andy Weir recently wrote for The Martian. Plus, Gerger just mentioned that he has added a feature to Listen Later that enables it to convert the text in images in JPEG, PNG, and WebP formats to audio. I was initially quizzical—what could be the use case? It turns out that certain communities commonly post screenshots of PDFs on X/Twitter. So, if that’s of interest, just send Listen Later a URL to an image to convert it, just like a standard article.

It’s amusing that I’m writing about Listen Later, which converts text into podcasts, in the same week that Apple introduced a feature that converts podcasts into text. There are plenty of issues with generative AI, but some of the capabilities that it enables are magical.

If you enjoy listening to podcasts and lack time to read everything you’d like, I encourage you to try Listen Later with the $2 free credit every new account gets.

Comments About Text-to-Podcast Service Listen Later Sponsoring TidBITS

Notable Replies

David Weintraub

11 March 2024

I use Apple’s built in Spoken Content Accessibility feature. I can get it to read any web article by swiping down with two fingers. I can use the Show Reader option to help skip ads or just use the controls to skip over parts I don’t want. I can also adjust the speed of the reading too

It’s a great option if you find an article and want to listen to it while you walk.
Mike Cohen

11 March 2024

I actually want the opposite - something that transcribes a podcast as text. I don’t like listening to podcasts and would prefer to read it.
Alan Forkosh

11 March 2024

That’s what the new version of the Apple Podcast app, released last week on all systems, does. Unfortunately, it only works for pocasts that are available without restrictions. If the podcast is affiliated with pay service, then no transcript will be availalbe.

Apple Newsroom

Apple introduces transcripts for Apple Podcasts

Apple Podcasts now offers transcripts of podcast episodes, making podcasts more accessible and easier to navigate.
ssteiff

12 March 2024

I signed up and tried it. Quite nice actually!
I tried giving it a tough one: converting Hebrew text to spoken Hebrew. To see if it will be a possible solution for a friend with severe eyesight issues.
Well… The good news is that it was able to read Hebrew and speak it. The bad news is that it still has a long way to go before being useful for Hebrew speakers. The written language uses punctuation marks (no AEIOU vowels in the Hebrew alphabet), but it is mostly printed without such marks and Hebrew readers can understand the vowels from the context, so that two words that are written the same (without punctuation) but use different vowels (thus sound different) can be properly understood by the reader. Alas - ListenLater, or rather the AI engine behind it, does a poor job of capturing that context and thus the spoken version is a mish-mash that’s difficult to comprehend. Add to that a very American accent that pronounces the “R” and “L” differently than spoken Hebrew. It would not be a big issue if the pronunciation of the vowels followed the context…
So… I still have $1.46 credit on my free test account. Not sure I’ll use it as I’m not a big fan of podcasts, but let’s see.
ssteiff

12 March 2024

Not too long after sending the above comment, I was pleasantly surprised to find a message from Yalim Gerger in my mailbox where he asked if I could recommend another AI engine and even suggested one that does a significantly better job of handling Hebrew text-to-speech. Based on my feedback Yalim now intends to use that engine instead.
Hats off to Yalim for responsiveness and attention to users’ feedback!
David C.

12 March 2024

ssteiff:

Add to that a very American accent that pronounces the “R” and “L” differently than spoken Hebrew

I assume you mean it is pronounced differently from an Israeli accent.

Hebrew, like many languages, has lots of regional variations in pronunciation. In addition to the big Ashkenazi vs. Sephardi differences, there are lots of smaller regional dialects, including those spoken by Jews from Yemen, from New York and from Eastern Europe.

Although academics may believe there is one true pronunciation, the reality is that there really isn’t. Just like there’s no one true pronunciation for English (e.g. American vs. British vs. Australian, and regional dialects within each).

But that having been said, I would agree that a Hebrew text-to-speech engine should include support for Israeli pronunciation, in addition to a few other mainstream dialects.
Micklestein

12 March 2024

I have been trying this since the latest TidBits and it’s been great! It is reasonable for it to have a 30-day storage limit, however I would like to keep the files (for personal use; not to share).

I’ve been messing around with Shortcuts to get that to happen, but I can only get it to pull the text from the RSS feed.

Any ideas on how to automate a download of the files into my iCloud Drive (or Google Drive)?
Yalim Gerger

12 March 2024

Hello,

Founder of Listen Later here. Thank you for using the service. The podcast app you are using may have an export audio feature. I use Overcast and it has a button to export audio files. It is on top of the episode information page and shows up when clicking the share icon. Hope this helps.

Warm regards,
Yalim
Yalim Gerger

12 March 2024

Thank you for your kind words. I’ll be traveling this week but this feature is at the top of my todo list for next week.
ssteiff

12 March 2024

David C.:

Hebrew, like many languages, has lots of regional variations in pronunciation.

While English is spoken in many countries as a main language (primary or second) Hebrew is a spoken language in one place only: Israel. It is one of the two official languages of the land (together with Arabic) and there is no other country where Hebrew is defined as an official language. The Ashkenazi and Sephardic dialects (that were used for prayers) have merged into one spoken dialect long ago and the exceptions are… well… exceptions.
Thus I’d expect a text-to-speech tool to speak the Israeli spoken dialect as a must. Other, esoteric, accents are nice to have.
When I say the tool pronounces the “R” and “L” differently than spoken Hebrew that means it sounds like a native English speaker breaking their teeth trying to speak a foreign language. Pretty much like I sound when I speak English. Some recognize my accent as Israeli right away. Others assume it to be Central or East European. Not much I can do about it. An accent is acquired during childhood. If you started acquiring a foreign language later you’d have to work very hard to acquire other accents.
I’m used to hearing different accents and am usually even able to pick up where it comes from. It wouldn’t bother me if the accent was the only thing off with the speech previously used by this tool. The bigger issue was not the accent, it was picking up the wrong word based on context. For example, the word ××¨×× without punctuation can be pronounced both “Baruch” (blessed) and “Beroch” (softly). And there’s also the Yiddish “Broch” (a major mishap). The Ashkenazi prayer pronunciation of “blessed” would be “Boruch”.
Robert Allen

12 March 2024

I should have been a bit more patient, but when I saw the leader for ListenLater, I immediately went to find it. I ended up on listenlater.fm, which appears to be similar. Since I haven’t, yet, used either, I’m curious to know how the two — https://www.listenlater.net/ and https://www.listenlater.fm — differ.

Text-to-Podcast Service Listen Later Sponsoring TidBITS

Subscribe today so you don’t miss any TidBITS articles!

Comments About Text-to-Podcast Service Listen Later Sponsoring TidBITS

Notable Replies

Join the discussion in the TidBITS Discourse forum

Participants

Share

Subscribe today so you don’t miss any TidBITS articles!

Comments About Text-to-Podcast Service Listen Later Sponsoring TidBITS

Notable Replies

Join the discussion in the TidBITS Discourse forum

Participants