A Short History of Voice Assistants — from Toy to Tool
No question about it, the modern age of conversational AI started on October 4th, 2011. The official start of Siri, Apple’s revolutionary voice assistant, marked the moment when millions of people all over the world started to use voice as a natural interface to interact with a machine, their iPhones.
Sure, conversational AI was a field of scientific research and technical development decades before with IBM introducing Shoebox in 1961, recognizing 16 words and digits. But now, all at once, the robot could speak — simply and entertainingly.
For most people, at the time their in-vehicle navigation system was the dominant experience they had with voice interfaces. And honestly — for many of us this was not a pleasant experience.
Siri was the hit at every party
Siri instead was fun. Siri was the hit at every party. Questions like “Hey Siri, who is your creator?” (A: As it says on the box, I was created by Apple in California) or “Hey Siri, do you follow the three laws of robotics?” (A: I forget the first three, but there’s a fourth: “A smart machine shall first consider which is more worth its while: To perform the given task, or, instead, to figure some way out of it.”) turned every get together into a burst of loud laughter, not only for tech early adopters.
Siri even became a sitcom star. In the 15th Episode of the 5th season of “The Big Bang Theory” Raj fell in love with Siri.
Soon there were siblings: Google Now, today known as Google Assistant. Microsofts Cortana and Alexa from Amazon, all at home in living rooms around the world thanks to the massive sales push of smart speakers and other connected devices such as the entry-price device Amazon Echo.
Infographic: A short history of the Voice Revolution.
At about the same time another revolution started. Automated voice interfaces supported call-based customer service offerings for years. However, it was a single-sided experience. The feedback channel was a number the user had to press: “Do you have questions about your contract, please press 2”. The reason why Siri and Alexa could become such a success with mass market awareness was the general jump in conversational AI. Especially cloud computing and new AI capacities catapulted the quality of voice assistants beyond poor navigation systems. A natural conversation became somehow possible.
While voice was the champions league because it’s still so difficult for a computer to understand spoken words and turn this into text, chatbots could also benefit from that development. Their big advantage was that the user is typing text into a mask, which is much easier to read for the machine than to understand the spoken text.
A technical revolution
In fact, there were a number of developments that had happened in parallel — significantly stronger performance in conversational AI, a new excitement of many users for the technology and the reliability of the system delivered against expectations, with which smart and not annoying chatbots were finally developed. A technical revolution took its course. Large companies have been built and all tech giants like IBM, Salesforce, etc. developed their own offerings.
As a matter of fact, this evolution had one very clear vertical use case — customer phone support. No matter if this was for order management or complaint handling — the chatbot is there, always reachable, and if the bot doesn’t have an answer, it can hand it over to an agent or just send an email.
The work that chatbots did exist before. It was done by human agents in large call centers all over the world. In airlines, banks, mobile carriers, etc. online or phone-based customer support is a huge cost component. In a world of infinite reach through social media, poorly accessible brands and corporations simply have an extreme disadvantage.
The chatbot was the solution. It not only could work 24/7, it allowed companies to reduce their workforce a lot. Call centers, once a very profitable business, started recently to struggle.
Chatbots killed the call centers
With the success of chatbots and companies building and deploying them, conversational AI found its first killer use case. While video killed the radio star, chatbots killed the call centers. The commercial case is simple — there was a lot of money to earn with chatbots by saving a lot of money for those companies who deployed the bots on their platforms. Besides customer support, also related applications like intranets, etc. started to use them regularly.
But the truth is that apart from this application conversational AI does have very limited commercial success to date. The other really large field of voice assistants, automotive, is still not able to tell how to earn money with it. While every car maker has to have their own assistant today — and the companies delivering those assistants earn good money — automotive does not earn money with this feature.
Just lately there are a lot of discussions about voice commerce in vehicles and through connected speakers. There are some good edge cases like transcription software, but customer support remains the only strong business case for conversational AI.
The next big thing: Mobile workers
At German Autolabs we believe that the next big thing in conversational AI is smart assistants for mobile workers. Humans at work, often traveling in a vehicle or not tied to a desk, are confronted with complex workflows, and a lot of dynamic information and are in need of a touch-free interface to their digital devices.
The world of logistics, in all its nuances, is full of these mobile workers: Truckers, drivers, couriers, delivery people, and warehouse workers in all kinds of industries from transport logistics to parcel services, food delivery or newspaper delivery. But that’s not all, voice assistants will also help sales and service reps and blue-collar workers on the road or in the field to interact with data.
Fixed workflows and digital processes
They all have fixed workflows and process steps that today require regular interaction with handheld scanners, mobile devices and other digital tools. Voice assistants are made for these use cases. They can proactively read out information depending on the context. They can act as a touch-free interface.
The levers are big: Onboarding can be accelerated, mental load reduced and efficiency increased. Today 60–80% of errors in the work environment are based on human errors. Voice assistants can help to reduce this significantly.
The next wave in conversational AI will take place in the field of mobile workers and logistics. There is a clear need, there are clear use cases and there is a commercial model which will outperform the investment costs quickly. The ingredients for the next wave of technical revolution in existing markets.