Audio-First #5: Milton, Media, And Going Multi-Modal

Audio-First

0:00

-9:42

Audio-First #5: Milton, Media, And Going Multi-Modal

Nick Pappageorge

Feb 07, 2020

Hello audio nerds, two big updates for you today.

First, Audio-First is now distributed on your favorite podcast apps. At long last, you can enjoy my voice on-the-go, and not just from your inbox. You’ll find the audio versions (past and future) uploaded here:

Depending on the audio app (e.g. Pocket Casts), the Notes section will be the fully-linked text that goes out with the newsletter.

I implore you to take the 2 seconds to subscribe because…(announcement #2) a number of upcoming episodes of Audio-First will be interviews with audio thinkers, iOS developers, and other tech insiders. Contain your excitement. These will be longer than the fortnightly-ish post I’ve been sending out, so I imagine you’ll want to listen as you would with a traditional podcast.

Tomorrow, our first interview drops with the great Drew Austin, aka Kneeling Bus. Drew regularly produces some of the most thought-provoking essays about tech and urban planning, and he makes one of my favorite weekly newsletters.

Some audio-related ideas of his to check out:

Always in: wireless headphones are AR devices
Paleo Internet, a conception of gourmet internet that made me bullish on audio
White Noise, which contains a passage I’ve read and re-read many times out loud to friends about digital vs physical life. Must read.

Anyway, Drew’s ideas have been very influential for my writing on Audio-First. So sign up on your podcast app, and hear us dive into all things audio, airpods, and urban living. (It’s about 40 minutes).

Audio-visual

Now that Audio-First is officially straddling both podcast and newsletter distribution, you might be asking yourself: How do I, a faithful reader-listener, consume this content as intended?

More than ever, there’s a chance you’ll gravitate toward the audio over the text, or the text over the audio. To me, it makes no difference. I hope it comes down to your own convenience.

Perhaps this is grandiose, but I hold a small hope that a few of you do both simultaneously. Realistically most of you don’t, but I love the possibility.

Mostly because in college I was forced to read Paradise Lost by John Milton. (If you don’t know, Paradise Lost was written in 17th-century vernacular, in poetic verse.) And I found the best way to absorb it was to speed up the audiobook track to a comfortable reading speed. With the audio and visual experience synced up, I was powering through Milton’s wackadoo language with ease. Every word from this genius was flooding my senses. As a result, the book hit way harder.

I’d certainly be thrilled if my writing had sense-flooding.

In fact, if it wasn’t so weird, I’d send it out these posts with a Spritz reader. Or I’d hire a video maker to do a whiteboard illustration. Really, I’d try anything up to Clockwork Orange-ing if there was demand. This is all to say, media makers want to have the most engaging tools possible to reach their audience.

Sure, Milton was writing one of the greatest works in the English language, and I’m just some techie with a substack. But we all have our aspirations as media makers. And what is media but a momentary hijacking of the mind? It’s just a matter of degree.

What’s been disappointing to realize with Audio-First is there’s not much media mixing. Right now, the majority of you are reading this (without audio). A sizable 39% of you will turn on the audio portion. But there’s not much combining. There’s no technology to make this hit harder or differently. The maximum has been reached. For now.

Multi-Modal

On a similar note, this week, Pace Capital’s Jordan Cooper penned a new post on this exact subject. Cooper argues that there’s a good chance with the advent of AR that mixing of audio and visual information will increase. He writes:

I think the insight that we will use computer vision to augment the way we process our physical surroundings is more or less a given. Cars are perhaps further along than people in this regard. It seems implausible that this assistive capability will not follow us into all realms of our mobility (i.e. when we get out of our car and walk). What I don’t think is a given is a) that the camera we use to capture our surroundings will be on our phone, or b) that the response to a camera based query will be displayed visually.
Most read/write situations don’t traverse disparate medium. If you capture visual information, it tends to be displayed visually. If you capture audio information, it tends to be displayed acoustically. Even if you capture tactile information, it tends to be displayed/processed tactilely.
But in the case of AR, I see the capture/write function and the read function decoupling as it relates to media type. I think [we] will use a wearable, voice activated camera to capture and query, and I think we’ll listen to the response or results that come from that query.

I agree with Jordan that we’re very limited by our media types. Media combinations haven’t evolved. There’s little mixing between these lanes.

In a way, Audio-First was born out of this idea of allowing multiple media types. But switching or combining is a whole new ballgame, and I feel like a lot of tech innovations (both voice, cameras, and later AR) will soon enable really novel interactions.

(I encourage you to read the full post. Cooper’s earlier writing on AirPods was a big inspiration.)