Want to keep learning?

This content is taken from the UAL Creative Computing Institute & Institute of Coding's online course, Introduction to Conversational Interfaces. Join the course to learn more.
Man using mobile voice activation

Voice interaction and Intelligent Personal Assistants

In the previous article we talked about intelligent personal assistants (IPAs) which are controlled by the human voice. This article will talk in more detail about these types of conversational interfaces.

One of the earliest examples of voice controlled interfaces was Shoebox. It was a tool made by IBM in 1962. It was the size of a shoebox, could perform mathematical functions and recognize sixteen spoken words as well as digits 0-9.

In 1971, scientists at Carnegie Mellon University, with the support of the US government, built Harpy. It could recognize 1,011 words, which is about the vocabulary of a three-year-old. In a massive improvement over the Shoebox, Harpy could actually process full sentences rather than only single words.

Once organizations came up with interfaces that could recognize word sequences like Harpy, companies began to build applications for the technology. In the 1980s, IBM produced a typewriter called Tangora that had access to a 20,000 word vocabulary. The Julie doll from the Worlds of Wonder toy company came out in 1987 and could recognize a child’s voice and respond to it. Throughout the 1990s, consumers had broader access to both personal computers and speech recognition technology. Apple began building speech recognition features into its Macintosh computers with PlainTalk in 1993.

Dragon’s 1997 NaturallySpeaking software could recognize and transcribe natural human speech. Crucially, this meant that users didn’t have to pause between each word, and could translate speech into a digital document at a rate of 100 words per minute. The program cost $695 to purchase, which made it ‘‘affordable’’ relative to earlier speech recognition devices. A version of NaturallySpeaking is still available for download.

Over the last decade, technology companies have worked to create increasingly sophisticated technology that will automate more processes and tasks we do throughout the day.

When Watson, a computer capable of answering questions based on natural language processing (NLP), beat Jeopardy! grand champion Ken Jennings in a trivia grudge match, it was seen as a major step forward for conversational interfaces.

Watson is geared towards industrial applications where it parses massive datasets and returns actionable information while IPAs such as the Google Assistant, Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa focus on consumer applications.

These assistants are powered by cloud-based AI algorithms, which means that they are connected over the internet to a server which has access to massive amounts of data. This enables them to “learn” new words and tasks the more they are used. These developments set the stage for the massive proliferation of smart home products that are controlled by voice.

Have your say

Which do you prefer, text based bots or voice bots? And why?

Share your answers in the comments section below.

Share this article:

This article is from the free online course:

Introduction to Conversational Interfaces

UAL Creative Computing Institute