Descript is a powerful audio tool

The fields of broadcasting and podcasting aren’t quite brothers, but in many ways they’re not much more distant than cousins. Equipment can overlap and sometimes the same software is used.

Product designers for podcasters often assume that an operator will be less experienced or skilled. This is not necessarily the case; but innovations can make new software and hardware more intuitive for people unfamiliar with technology.

Innovative software is Description.

Designed for MacOS (10.11) or Windows 10 (and later), it’s an all-in-one offering that provides video editing, audio editing, automatic transcription, and lots of features. It is marketed for podcasting, video editing, screen recording and transcription applications.

Although the basic version is free, additional transcription can cost $12 and more per month, with a discount for one year prepaid. There are “Creator”, “Pro” and “Enterprise” levels.

Descript’s website says its users include familiar media names like Audible, WNYC, ESPN, and iHeartMedia.

On-the-fly editing

If anyone has ever designed editing software for your six-year-old or your grandma, this might be the one, because it’s intuitive – easy to use, smooth in editing.

It probably wouldn’t work well as a replacement for traditional editing software like Audition or Adobe’s Pro Tools or even Audacity. But it can definitely speed up editing podcasts, whether video and/or audio. And it has many features that allow you to add background music or still video or even pre-recorded video from file.

Whether you are working with a video recording or just audio, the software is based on converting the recording to text and then script-based editing.

This is where the magic happens. Once in script form, you can edit it like any Word document. The audio/video editing will automatically follow what you did with the text script.

This image from the Descript website shows advanced editing, including inserting additional content into the edit, including video with picture-in-picture effect, still images, background music additions, and editing changes.

If you don’t like something you said and delete the text, audio/video editing performs the matching function.

It is even more interesting to change what you said. Descript will take these words and modify them in a rearranged form.

In testing this, I found that the “realism” of the delivery post-edit will depend on how the word was originally pronounced. So it can be a bit “hit or miss”, although it’s still cool.

In my tests with ingested content and recordings, it did an impressive job, and my edits often produced a natural-sounding delivery.

There is a function for “de-ummm”, “de-ahhh”, “de-errr”. “de-like” and “de-kinda”, for speakers who throw these fillers into their delivery, and this is done automatically. This is called “removing filler words”. There’s even a “shorten spaces between words” feature to automatically clean up excessive pauses.

While it doesn’t offer all the features of home-made audio and video editors, Descript does have features we’re familiar with, including non-destructive multitrack editing, titles, transitions and keyframe animation, mixing and audio mastering (“rubber-banding audio levels”). And it allows you to export the project to professional applications such as ProTools, Adobe Audition or Premier and Final Cut.

Another feature of Descript enables intuitive multi-user collaboration for editing, so multiple people can work on it at the same time.

For fun, I transferred footage from a comedy show into Descript that included a singer with music and people talking at the same time. The software did an admirable job deciphering the spoken word with music underneath. It is not easy for speech-to-text conversion.

Some text conversions were funny or weird – Diet Sprite morphed into Diet Striding – but if you consider text to be just your guideline for editing, it doesn’t hurt the editing aspect.

An audio recording and editing test.

Video and audio editing takes a different turn with Descript. In fact, you more or less edit the text to edit the audio and video content.

By dragging a still image between the text, it edits itself into the video. Clicking on the image adjusts the duration of its appearance in the video. The same goes for dragging audio and video content into the text script. It is then placed in the audio/video montage.

Since Descript works from the script, it provides timing marks and allows you to adjust changes over time. For back-timing and producing a show of exact duration, this can help simplify time compression or expansion (to meet a timed window).

There are automatic functions to remove background noise, clean up audio, automatically level audio, and even process audio. The video aspect of the software allows for titles and “lower thirds” (adding names, titles, etc.), as well as effects and transitions.

[Check Out More Product Evaluations in Our Products Section]

Impressive, scary

A unique AI feature is Overdub. This opens up a Pandora’s box of possibilities, both good and bad.

Your voice file is sampled and this allows for something that is NOT possible with home-made editing: you can type into the script under “overdub” new words or things that weren’t said during the recording, which are then injected using AI into your own voice.

Yes, the computer generates your own voice and reasonably matches your real voice.

Here I am testing the speech recognition aspect and creating overdub audio montages. The video is from a webcam.

Be aware that Descript actually pulls your voice into their server and creates the sample, which they claim takes anywhere from two to 24 hours to sample. You should also read and record a disclaimer stating that this is actually your voice and that they are allowed to do so.

Is it convincing? Well, to some extent.

Like most AI samples, it’s a human voice that’s been sampled and converted, but what I call “emotional inflection” isn’t, at least so far, possible with voices of AI and sampling that I have experienced.

People who do voice work will understand that emotion and inflection in the delivery of a script is key to “selling” the copy, and at this point only a human really understands the meaning of his words for convey that emotion.

Maybe one day the AI ​​will recognize the meaning of the words and the true meaning of the sentence and somehow modify this delivery. But for now, voiceover jobs seem safe in this regard.

Of course, this means there’s no need to go back to the studio to record new lines, as AI overdub can be used for corrections. But you can sense the possibility of someone’s voice being “stolen,” despite well-meaning precautions.

It is interesting to see the many possibilities and uses of this software. It is a unique and interesting way to edit both audio and video.

To really understand exactly what it is capable of doing, I recommend download it and play with the software.You may find this helps change how some of your talents are modded and how you deliver content to the web or social media.

This will definitely make you wonder what’s next.

There are a plethora of Descript training and how-to videos on YouTube; and his website contains a lot of useful and awesome information.

Comments are closed.