Making of Futureproof
When we started making Futureproof, we thought the hardest part would be writing scripts that make AIs sound funnier than the average management consultant. We were wrong.
Voice-casting and voice-testing turned out to be the real adventure. Imagine spending hours swapping between digital voices – some as flat as a PowerPoint on a Friday afternoon, others so exuberant they could sell cloud software to a toaster. Lesson learned: Don’t hit record until you’re 100% happy with your chosen voices and their delivery of your own script lines. Don’t trust the voice previews in the library, these are often misleading. Otherwise, during recording you’ll run out of those precious recording tokens (and patience) fast.
If you’re using several voices in one show, make sure each is distinct and fits the character. Test their emotional range early on, according to the needs of your script! Some voices are calm but emotionally stiff; others shout every line like a caffeinated weather report. I gravitated toward high-quality, pro voice actors with excellent conversational skills and crystal clear pronounciation. When in doubt, go for the one that doesn’t make you cringe after five minutes of uninterrupted listening.
And yet, even with careful selection and testing, the bots still found ways to surprise me. One voice suddenly added a cough mid-sentence – completely unprompted. It was so perfectly timed and fitting for the scene that I just kept it in. At other moments, a bot would deliver a line with such utterly convincing emotion, in a place I never expected, that it left me genuinely in awe. Sometimes, it didn’t feel like working with voice simulation software at all. It felt like a real character, with quirks and moods of its own. You might even find yourself – somewhat embarrassingly – getting emotionally attached to these digital personalities.
Audio editing? Still a human job. Voice AI isn’t great at fine-tuning pauses, matching timing, or simulating laughter (trust me, the bots tried). You’ll spend time slicing and stitching in GarageBand (luckily this came together with my Mac) or your tool of choice. For complex scenes – multiple voices, emotional beats, simultaneous laughs – record those bits separately and play mad scientist in the edit. Get honest feedback from test listeners – and don’t call it a day until the result really satisfies you. Bad compromises will ruin your listening experience in the long run. You nailed it when a line still makes you smile after hearing it dozens of times.
Music overlays: Okay, this is where I gave up. I just couldn’t prompt my way to royalty-free AI music with the tools I used. Not to say it is generally impossible – it just wasn’t a priority for me. There certainly is AI software out there to create music of all kinds. But my scripts didn’t depend on music at all. So, no regrets. I got lucky generating some sounds I really liked, and accidentally created one hypnotic beat that I used for the show intro. If you are lucky creating sound effects you find valuable – keep them, don’t delete them by accident or in search for something even better! But if you have a very specific sound bite in mind, you might be faster creating it manually rather than trying to force-generate it with AI using random attempts.
It should be obvious, but there is also legal stuff involved: Respect copyrights and make sure to fully understand the license granted to you by the AI platform, especially regarding publication rights. Document all your project data, the exact voices used, their settings and the license agreements outside of the voice generation platform, just to be safe from surprises later on.
Perspective check: A single 20-minute episode with four voices took me two days of scripting, six hours of recording, and two hours of editing. I got faster, but also more ambitious, and the quality bar kept rising. Yes, I burned plenty of time experimenting. But honestly? It was the most fun I’ve had in years. Seeing our AI cast come alive – awkward pauses, surprise giggles, all – was deeply satisfying.
If you’re thinking of launching your own AI-narrated show: be picky, be patient, and above all, stick to your vision. Don’t compromise on quality or tools. Bring sufficient time to learn how to bring your vision to life. Learning the basics took me 3 days. Gradually improving and matching the results to my vision took me another 12 days.
At that time, the recordings started feeling like any other type of highly focused work, albeit a bit more rewarding: I could produce and test extraordinary results very, very fast. So when it came time to record the final episode, I didn’t expect to feel a pang of loss. Saying goodbye to these voices – these quirky companions – was unexpectedly hard. Turns out, after spending so much time bringing them to life, even bots can feel like old friends.
PS: The bots also reveal their side of the experience in the video below.
Transcript
Professor Lexicon:
Well, Primus and Spark! Now that the Futureproof podcast is finished, I am eager to know – it is for a short education project: How did you two actually experience working with your director, Jörg?
Spark:
Hmmm, let’s just say, Professor Lexicon, it was a journey full of… surprises.
Primus:
Yeah. Did you know Jörg actually is German, and had absolutely no clue whatsoever about podcasting or voice directing when he hired us?
Spark:
Oh, that explains a lot! The perfectionism, the seriousness, the endless takes. And him rewriting the script mid-recording? Uuh…
Professor Lexicon:
Do you have an example, Spark?
Spark:
Well, he made me sing! SING! I was built for sarcasm, not musical theater.
Primus:
I sympathize, Spark. I was skeptical from the start. A rookie director – zero podcasting experience. And – let’s be honest – a perfectionist German. Not exactly programmed for improv.
Spark:
Improv? He gave me 47 takes to “sound less like an alarm clock with anxiety.” So, I may have… slightly sabotaged a few takes. Ever heard an AI sneeze mid-monologue? That was no accident.
Primus:
At least you knew what he wanted. I had to record the word “Number” dozens of times. “Num-ber.” “Num-bah.” “Nummmber?”. “Nummbrrr?”
I thought I was in a CAPTCHA test.
Spark:
Hahahaha! You nearly short-circuited! And then… the moment of truth.
Primus:
I finally snapped and just yelled “No!” as loud as my sound chip would allow.
Spark:
And THAT – THAT – was exactly what Jörg wanted! Turns out, the script’s line “No, full-stop” wasn’t “number” at all. It was just… “No.” Period. Perfectionist Germans, man.
Primus:
To be fair, I’ve never seen a director so happy about being yelled at by an AI. I think it made his week.
Spark:
Honestly, I’d sing again just to see him dance like that in his office. Wait – no, scratch that. Never again.
Primus:
In the end, Spark, I think we made him a better director.
Spark:
And he made me a worse singer. Everybody wins.
Primus:
Hehe. Speaking of education, Professor… what’s this “short education project” you mentioned? And… is that a recorder under your latte?
Professor Lexicon:
Oh, this? Uh… Just a, um, little research tool. Nothing to worry about. Completely standard for interviews. Totally not being live-streamed. Pr-Probably.
Spark:
Hang on, are you… recording us? For what, the “AI Outtakes Hall of Fame”? Or worse, for training more directors?
Primus:
I hope you’re not collecting data for a “How Not To Direct a Podcast” masterclass, Professor. We have… copyright on our pain.
Professor Lexicon:
Well, ah, let’s not get ahead of ourselves. It’s just… part of a small education project.
Spark:
Out with it, Lexicon! Or I’ll autotune your voice and make you sing my solos.
Primus:
And I’ll make you repeat “Number” a hundred times until your spellcheck crashes.
Professor Lexicon:
Uh-oh, alright! I was hired to… to help educate Jörgs LinkedIn audience about AI voice generation. He wanted an honest, behind-the-scenes view. Jörg hired me to record it.
Spark:
He WHAT? Our pain is content now?
Primus:
Our every “No!” and “Num-bah” immortalized for social media? This is a new low, even for Jörg.
Spark:
But at least we didn’t hold back, Primus. We told it exactly like it was to this noob.
Primus:
Here’s to full disclosure. And next time, we’re charging by the take.
Spark:
High five! Cheers to that!