Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. T&Cs apply

Speaker recognition

What is speech recognition and what are its classifications? In this article, Dr Ming Yan discusses his recent research.

Concept

Based on one or more segments of speech data, recognize which voice is said by “who”. Also known as “Voiceprint Recognition”. Find out the personality factors of the speaker and emphasize the differences between different people.

Compared with other biometric technologies, speaker recognition technology uses speech signals to confirm identity, which has the characteristics of low cost, convenient collection, easy storage and friendly interaction. It can be regarded as the second ID card of human beings, and can also be operated remotely by telephone or network.

The application of voiceprint recognition technology can be applied to almost every corner of People’s Daily life, such as the information field, banking and securities, public security and justice, military and national defense, security and document anti-counterfeiting and other scenes.

Classification

According to the number of speakers to be recognized, speaker recognition can be divided into two categories: Speaker Identification and Speaker Verification.

  • Speaker Identification: It is a “multiple choice” problem to judge which one of several people says a certain speech.
  • Speaker Verification: It is mainly used to confirm whether a speech is said by a designated person, which is a “one-to-one discrimination” problem.

According to whether the speech content is qualified or not, speaker technologies can be classified into text-dependent and text-independent categories.

  • Text-Dependent: Users are required to pronounce according to the specified content, and each person’s voiceprint model is established accurately one by one. In recognition, we must also pronounce according to the specified content, so we can achieve better recognition effect. But the system needs the cooperation of the user. If the user’s pronunciation does not conform to the specified content, the user cannot be identified correctly.
  • Text-Independent: It is relatively difficult to build the model without regulated speaker’s pronunciation , but it is convenient for users and can be applied in a wide range. According to specific tasks and applications, the two recognition technologies have different application areas.

Principle

The first stage is the training part. Firstly, the features of the speech used to establish the model are extracted, and then the model is trained, and the results are sent to the voiceprint model library.

The second stage is the recognition part, which extracts the features of the target speaker, then combines the voiceprint model library for voiceprint matching and scoring, and finally identifies the speaker according to the highest score.

Your task

What are the applications of speaker recognition in life?

Share your thoughts and ideas in the comments below.

© Communication University of China
This article is from the free online

Introduction to Digital Media

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now