Speaker Recognition-Identifying People by their Voices george r. Doddington, member, ieee



Yüklə 210,37 Kb.
səhifə1/10
tarix20.09.2023
ölçüsü210,37 Kb.
#145400
  1   2   3   4   5   6   7   8   9   10
doddington1985


Speaker Recognition—Identifying People by their Voices
GEORGE R. DODDINGTON, member, ieee
Invited Paper




The usefulness of identifying a person from the characteristics of his voice is increasing with the growing importance of automatic information processing and telecommunications. This paper re­views the voice characteristics and identification techniques used in recognizing people by their voices. A discussion of inherent performance limitations, along with a review of the performance achieved by listening, visual examination of spectrograms, and automatic computer techniques, attempts to provide a perspective with which to evaluate the potential of speaker recognition and productive directions for research into and application of speaker recognition technology.

  1. Introduction

The human ear is a marvelous organ. Beyond our uniquely human ability to receive and decode spoken language, the ear supplies us with the ability to perform many diverse functions. These include, for example, localization of ob­jects, enjoyment of music, and the identification of people by their voices. Currently, along with efforts to develop computer procedures that understand spoken messages, there is also considerable interest in developing procedures that identify people from their voices. The purpose of this paper is to review this speaker recognition problem and the technology being developed and applied to solve it. First, however, it might be appropriate to discuss the motivation for such study. Why develop a speaker recognition ma­chine?
Speaker recognition is an example of biometric personal identification. This term is used to differentiate techniques that base identification on certain intrinsic characteristics of the person (such as voice, fingerprints, retinal patterns, or genetic structure) from those that use artifacts for identifi­cation (such as keys, badges, magnetic cards, or memorized passwords). This distinction confers upon biometric tech­niques the implication of greater identification reliability, perhaps even infallibility, because the intrinsic biometrics are presumed to be more reliable than artifacts, perhaps even unique. Thus a prime motivation for studying speaker recognition is to achieve more reliable personal identifica­tion. This is particularly true for security applications, such
Manuscript received March 20,1985; revised July 18, 1985.
The author is with Speech Research, Computer Science Laborato­ries, Texas Instruments Inc., Dallas, TX 75265, USA. as physical access control (a voice-actuated door lock for your home or ignition switch for your automobile), com­puter data access control, or automatic telephone transac­tion control (airline reservations or bank-by-phone). Con­venience is another benefit which accrues to a biometric system, since biometric attributes cannot be lost or forgot­ten and thus need not be remembered.
Applications also exist which depend uniquely upon the identification of a person by his voice. Such applications include forensic science and the automated processing of reconnaissance information. For example, 32 channels of enemy air-to-ground telecommunications are being moni­tored to detect activities of the Red Baron. Is he in the air now? Or an axe murderer telephones the location of his victim's body to the police. Does the suspect's voice match the murderer's? The identification problems posed by these applications can only be solved by speaker recognition technology. So far, all of the applications that have been mentioned fall into a category which may be called voice verification. But there are several different speaker recogni­tion task definitions, with different performance character­istics for each. These will now be described.

  1. Types of Speaker Recognition Tasks and Applications

Speaker recognition is a generic term which refers to any task which discriminates between people based upon their voice characteristics. Within this general task description there are two specific tasks that have been studied ex­tensively. These are referred to as speaker identification and speaker verification. (Sometimes the term “voice'' or "talker" is substituted for "speaker," and sometimes the term "authentication" is substituted for "verification." Thus for example, speaker verification and voice authentication refer to the same task.) The distinction between identifica­tion and verification is simple: The speaker identification task is to classify an unlabeled voice token as belonging to (having been spoken by) one of a set of N reference speakers (N possible outcomes), whereas the speaker verifi­cation task is to decide whether or not an unlabeled voice token belongs to a specific reference speaker (2 possible outcomes—the token is either accepted as belonging to the reference speaker or is rejected as belonging to an impostor). Note that the information in bits, denoted /, to be gained from the identification task is in general greater than that to be gained from the verification task:
/ident = log2(N) (assuming equal a priori probabili­ty of occurrence for all reference speakers)
/ver = 1 (assuming a priori probability of occurence of reference speaker = 0.5).
It is natural then to expect that, all other factors being equal, recognition performance (i.e., probability of error) will be better for the verification task than for the identifi­cation task. An example of this contrast is shown in Fig. 1





Yüklə 210,37 Kb.

Dostları ilə paylaş:
  1   2   3   4   5   6   7   8   9   10




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin