March 29, 2024

Detailed explanation of GMM-HMM speech recognition principle

This article briefly describes the principle, modeling and testing process of GMM-HMM in speech recognition.

1. What is the Hidden Markov Model?

Three issues that HMM has to solve:

1) Likelihood

2) Decoding

3) Training

2. What is GMM? How to use GMM to find the probability of a phoneme?

3. GMM+HMM Dafa solves speech recognition

3.1 Identification

3.2 Training

3.2.1 Training the params of GMM

3.2.2 Training the params of HMM

=========================================================== ==================

1. What is the Hidden Markov Model?

Detailed explanation of GMM-HMM speech recognition principle

ANS: A Markov process with hidden nodes (unobservable) and visible nodes (see details).

The hidden node represents the state, and the visible node represents the voice we hear or the timing signal we see.

Initially, we specify the structure of this HMM. When training the HMM model: Given n timing signals y1...yT (training samples), estimate the parameters using MLE (typically implemented in EM):

1. Initial probability of N states

2. State transition probability a

3. Output probability b

--------------

In speech processing, a word consists of several phoneme (phonemes);

Each HMM corresponds to a word or phoneme (phoneme)

A word is represented as a number of states, and each state is represented as a phoneme.

There are three issues that need to be solved with HMM:

1) Likelihood: The probability that an HMM generates a sequence of observaTIon sequences x <the Forward algorithm>

Detailed explanation of GMM-HMM speech recognition principle

Where αt(sj) indicates that the HMM is in state j at time t, and observaTIon = {x1,. . ., the probability of xt}

Detailed explanation of GMM-HMM speech recognition principle ,

Aij is the transition probability of state i to state j,

Bj(xt) represents the probability of generating xt at state j,

2) Decoding: Given a sequence of observaTIon sequence x, find the most likely dependent HMM state sequence <the Viterbi algorithm>

In actual calculations, pruning is done. Instead of calculating the probability of each possible state sequence, use Viterbi approximaTIon:

From time 1:t, only the state and probability with the highest transition probability are recorded.

Let Vt(si) be the maximum probability that state j is the transition from all states at time t-1 to time t:

Detailed explanation of GMM-HMM speech recognition principle

Remember Detailed explanation of GMM-HMM speech recognition principle It is: the state from the time t-1 to the time t is the highest probability of the state j;

The Viterbi approximation process is as follows:

Detailed explanation of GMM-HMM speech recognition principle

Then based on the most likely transfer state sequence recorded Backtracking:

Detailed explanation of GMM-HMM speech recognition principle

3) Training: Given an observation sequence x, train the HMM parameter λ = {aij, bij} the EM (Forward-Backward) algorithm

In this part, we put it in "3. GMM+HMM Dafa to solve speech recognition" and talk with GMM training.

-------------------------------------------------- -------------------

Agricultural Spraying Drone Pesticide Tank,Brushless Pump,Spraying Nozzle, Spraying Boom ,Folding Spraying Bar etc.

Agriculture Drone Parts

Agricultural Spraying Drone Pesticide Tank,Brushless Pump,Spraying Nozzle,Spraying Boom

shenzhen GC Electronics Co.,Ltd. , https://www.jmrdrone.com