Skip main navigation

How do voice technologies work?

Kane Simms from VUX World talks us through how voice technology works and Nicky Birch from the BBC explains some of the language used in the industry.
22.4
So how it works, then, is a user says something that’s usually called an utterance. If I said, Alexa, book me a taxi, that’s an utterance. Within that utterance, first of all, there’s a wake word. A wake word is Alexa, hey Google, hey Siri, hey Cortana, if anyone uses Cortana. Then you have a launch request, which is ask so and so, open so and so, launch xyz, and then you have the invocation phrase, which might be my fantastic taxi company. Hey Alexa, open my fantastic taxi company. So all of that together is made up of a wake word, a launch request, and an invocation, but all of it wrapped up together is called an utterance.
68.3
Anything someone says to any conversational system is an utterance. What then happens is the audio that you have just spoken is recorded by Alexa and sent to the cloud. What then happens is a process called ASR, which is automatic speech recognition. That goes through the audio sample and turns the audio into text, and then the text is fed through an NLP, which is called natural language processing. The natural language processing engine cleans up that text so that is in a legible format. If I say, can I have a taxi for six. No, I mean seven. Actually, not actually, half past six.
109
I’ve said a lot of stuff there, but all I really care about is I need a taxi for half past six. Part of the natural language processing is to figure out all of those mistakes that I’ve made and extract from that sentence an intent - it’s the thing that the user is trying to do. That intent is then sent to the application, the third party application if it’s your taxi service, and then you need to respond to that intent. So you take whatever’s in that intent, and intents in Alexa are created and made up of what’s called slots.
140
So if I want to book a taxi, I’ll have a book a taxi intent, but then there are certain values that I need - the system needs - in order to be able to actually book that taxi. So I’ll need to know what time do you need it for. I’ll need to know what address are you leaving from. Those are what’s called slots, and those are values that give me enough information to be able to fulfill something. I then handle that, the logic, in my code base.
165.4
I’ll make the booking, create the response, send the response back as text to the Alexa cloud, and then text to speech - which is another type of technology - will take that text, translate it into speech, and create an audio file that is then read back out through Alexa. So that’s an overview of how the vast majority of these platforms work. There might be some differences and nuances here and there, but broadly speaking, that’s the process and the stack that’s involved. A skill is Amazon’s word for a voice application. Google call them actions, and other people may call them experiences. It’s - just think of them as an app. That’s probably the best way.
206.8
As a user, you won’t necessarily know that the skills or apps are created by individual companies or people or organisations. You don’t hear that credited necessarily. You just ask your utterance, and you receive it. It’s not like where you have different branding. Like, you might have - because with images and with apps on an app store, you may see, oh, this is made by this company or this is made by that company. You don’t see any of that. So it’s all kind of hidden. It all appears like it’s all from Alexa or all from the Google system.
There are a lot of terms and abbreviations used in technology and the field of conversational interfaces is no different.
In this video, Kane Simms of VUX World takes you through the principles involved in voice technology, from ASR to NLP. Now it’s time to find out what those things mean. You’ll be an expert in no time.
This article is from the free online

Introduction to Conversational Interfaces

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education