Skip main navigation

Voice interaction and Intelligent Personal Assistants

This step will inform you about intelligent personal assistants and voice controlled interfaces.
Man using mobile voice activation

In the previous article we talked about intelligent personal assistants (IPAs) which are controlled by the human voice. This article will talk in more detail about these types of conversational interfaces.

One of the earliest examples of voice controlled interfaces was Shoebox. It was a tool made by IBM in 1962. It was the size of a shoebox, could perform mathematical functions and recognize sixteen spoken words as well as digits 0-9.

In 1971, scientists at Carnegie Mellon University, with the support of the US government, built Harpy. It could recognize 1,011 words, which is about the vocabulary of a three-year-old. In a massive improvement over the Shoebox, Harpy could actually process full sentences rather than only single words.

Once organizations came up with interfaces that could recognize word sequences like Harpy, companies began to build applications for the technology. In the 1980s, IBM produced a typewriter called Tangora that had access to a 20,000 word vocabulary. The Julie doll from the Worlds of Wonder toy company came out in 1987 and could recognize a child’s voice and respond to it. Throughout the 1990s, consumers had broader access to both personal computers and speech recognition technology. Apple began building speech recognition features into its Macintosh computers with PlainTalk in 1993.

Dragon’s 1997 NaturallySpeaking software could recognize and transcribe natural human speech. Crucially, this meant that users didn’t have to pause between each word, and could translate speech into a digital document at a rate of 100 words per minute. The program cost $695 to purchase, which made it ‘‘affordable’’ relative to earlier speech recognition devices. A version of NaturallySpeaking is still available for download.

Over the last decade, technology companies have worked to create increasingly sophisticated technology that will automate more processes and tasks we do throughout the day.

When Watson, a computer capable of answering questions based on natural language processing (NLP), beat Jeopardy! grand champion Ken Jennings in a trivia grudge match, it was seen as a major step forward for conversational interfaces.

Watson is geared towards industrial applications where it parses massive datasets and returns actionable information while IPAs such as the Google Assistant, Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa focus on consumer applications.

These assistants are powered by cloud-based AI algorithms, which means that they are connected over the internet to a server which has access to massive amounts of data. This enables them to “learn” new words and tasks the more they are used. These developments set the stage for the massive proliferation of smart home products that are controlled by voice.

Have your say

Which do you prefer, text based bots or voice bots? And why?
Share your answers in the comments section below.
This article is from the free online

Introduction to Conversational Interfaces

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education