Summary

  • OpenAI announced recently that ChatGPT will be able to interact with users via voice.
  • Voice dictation technology has flaws that have hindered its widespread adoption. Why deal with the lengthy responses of voice assistants when you can quickly find answers yourself?
  • Privacy is also a major concern with voice assistants due to the lack of security and the risk of always-listening microphones.
  • Exclusive voice-operated devices are unlikely to become a reality due to practical reasons and user preferences. While voice assistants can be helpful additions, the technology to understand users has already reached a satisfactory level of accuracy.

Amidst turmoil at OpenAI, the company announced that ChatGPT would soon be able to interact with users via their voices on Android and iOS. Not only can a user speak to ChatGPT, but they'll now receive an audible response, too. While that's cool on the surface, there's a reason voice dictation, a technology that's been mostly mature for many years now, hasn't really taken off. Sure, almost every major ecosystem has its own version, from Amazon Alexa to Siri, but the tech has so many flaws that not even ChatGPT can make it interesting.

Nobody wants to deal with the preamble

Just give me the answer!

JBL-Link-View-Google-Nest-Mini-retro-bluetooth-speaker-1

One of my biggest annoyances with voice assistants comes from dealing with the preamble of both initiating the conversation and getting the answer. I can often look it up quicker myself, and in times when my hands are full, the best use I find for these kinds of assistants is for setting timers, not responding to messages or googling questions. OpenAI recently shared an example of a conversation you could have with ChatGPT.

While technically impressive, the demonstration is a bit ridiculous. First off, the question — about how many 16-inch pizzas to order — is absurd. I understand that it's there to demonstrate ChatGPT's ability to deal with complex conversations, but not only is the answer needlessly complex, but the delivery is, too. If I'm asking a mathematical question of an AI, I just want the answer. Tell me the number first, and then explain it. If I don't care about the explanation, I can just cancel the playback.

Switching that up isn't enough, though, because that's something AI can already do. Maybe the contextual nature of the number of slices of pizza and the number of people requires the AI to "research," but at some point, I'm sure features like that will come to all other AI voice assistants, too. Once it does, we're back to square one when even the best Amazon Echo devices can do what OpenAI has been moving towards at a breakneck pace.

If I'm using my smartphone, it's easy for me to quickly type and search for something. I can do that anywhere, without being heard, and I can then read through the answers at my leisure. If I ask a voice assistant to find something for me, chances are I search for it myself after the fact to see what other options there are. Voice assistants are too wordy, and they always will be.

Privacy is a concern, too, on two fronts

Nobody wants to hear how stupid my questions are

Amazon Echo and Echo Dot on a table

What is the end goal of a voice assistant? They're never going to replace smartphones (as much as companies like Humane want them to) for several key reasons, the most important being privacy. Logging into services, sending private messages, or even googling those silly, dumb questions you use incognito mode for isn't really possible to do privately with a voice-based device.

As a result, outside very niche, private-use contexts, voice assistants can never replace a smartphone or privately-used device, and I don't see that ever changing. Without a fundamental shift in how people view their own privacy and what they're willing to say out loud, it's hard to convince people that they want to use their voice to operate their devices all the time.

We don't need the same news report being read out in 15 different places or one person repeatedly asking about how many 16-inch pizzas they need for 778 people.

Imagine a world where, instead of everybody using their phones on a packed subway, they use a voice-powered device. Imagine how hectic that would get, not to mention loud. Your own devices would have trouble discerning voices, and a packed subway would theoretically be a cacophony of noise. The subway is bad enough. It doesn't need the same news report being read out in 15 different places or one person repeatedly asking about how many 16-inch pizzas they need for 778 people.

It's also hard enough to convince people as it is that your devices aren't listening to you 24/7, but people are already antsy about having always-listening microphones near them. With devices that can only be voice-operated, it will be hard not to feel listened to at all times.

Voice-only devices are a dream that will never become a reality

And I'm OK with that

A person holding the Humane AI Pin.
Source: Humane

I'm a technology enthusiast, but I think it's for the best that devices aren't going to be exclusively voice-operated for a long time. It's nigh-on impossible for that to be the case for the reasons outlined here. While companies like Humane are pushing the envelope, they'll ultimately fail to capture any reasonable market with a device that relies on voice as the main way to operate it.

Voice assistants will forever be a helpful addition to devices that we use daily, but the technology to understand us has been good enough for a long time now.