【語音識別】之梅爾頻率倒譜系數(shù)(mfcc)及Python實(shí)現(xiàn)
- 一、mel濾波器
- 二、mfcc特征
- Python實(shí)現(xiàn)
語音識別系統(tǒng)的第一步是進(jìn)行特征提取,mfcc是描述短時功率譜包絡(luò)的一種特征,在語音識別系統(tǒng)中被廣泛應(yīng)用。
一、mel濾波器
每一段語音信號被分為多幀,每幀信號都對應(yīng)一個頻譜(通過FFT變換實(shí)現(xiàn)),頻譜表示頻率與信號能量之間的關(guān)系。mel濾波器是指多個帶通濾波器,在mel頻率中帶通濾波器的通帶是等寬的,但在赫茲(Hertz)頻譜內(nèi)mel濾波器在低頻處較密集切通帶較窄,高頻處較稀疏且通帶較寬,旨在通過在較低頻率處更具辨別性并且在較高頻率處較少辨別性來模擬非線性人類耳朵對聲音的感知。
赫茲頻率和梅爾頻率之間的關(guān)系為:
F m e l = 1125 ln ? ( 1 + f / 700 ) {F_{mel}} = 1125\ln (1 + f/700)
F
m
e
l
?
=
1
1
2
5
ln
(
1
+
f
/
7
0
0
)
f = 700 ( e F / 1125 ? 1 ) f = 700\left( {{e^{F/1125}} - 1} \right)
f
=
7
0
0
(
e
F
/
1
1
2
5
?
1
)
假設(shè)在梅爾頻譜內(nèi),有
M M
M
個帶通濾波器
H m ( k ) , 0 ≤ m < M {H_m}\left( k \right),0 \le m < M
H
m
?
(
k
)
,
0
≤
m
<
M
,每個帶通濾波器的中心頻率為
F ( m ) F(m)
F
(
m
)
每個帶通濾波器的傳遞函數(shù)為:
H m ( k ) = { 0 , k < F ( m ? 1 ) k ? F ( m ? 1 ) F ( m ) ? F ( m ? 1 ) , F ( m ? 1 ) ≤ k ≤ F ( m ) F ( m + 1 ) ? k F ( m + 1 ) ? F ( m ) , F ( m ) ≤ k ≤ F ( m + 1 ) 0 , k > F ( m + 1 ) {H_m}\left( k \right) = \left\{ {\begin{matrix} {0,k < F\left( {m - 1} \right)}\\ {\frac{{k - F\left( {m - 1} \right)}}{{F(m) - F(m - 1)}},F(m - 1) \le k \le F(m)}\\ {\frac{{F\left( {m + 1} \right) - k}}{{F(m + 1) - F(m)}},F(m) \le k \le F(m + 1)}\\ {0,k > F(m + 1)} \end{matrix}} \right.
H
m
?
(
k
)
=
?
?
?
?
?
?
?
?
?
?
0
,
k
<
F
(
m
?
1
)
F
(
m
)
?
F
(
m
?
1
)
k
?
F
(
m
?
1
)
?
,
F
(
m
?
1
)
≤
k
≤
F
(
m
)
F
(
m
+
1
)
?
F
(
m
)
F
(
m
+
1
)
?
k
?
,
F
(
m
)
≤
k
≤
F
(
m
+
1
)
0
,
k
>
F
(
m
+
1
)
?
下圖為赫茲頻率內(nèi)的mel濾波器,帶通濾波器個數(shù)為24:
二、mfcc特征
MFCC系數(shù)提取步驟:
(1)語音信號分幀處理
(2)每一幀傅里葉變換---->功率譜
(3)將短時功率譜通過mel濾波器
(4)濾波器組系數(shù)取對數(shù)
(5)將濾波器組系數(shù)的對數(shù)進(jìn)行離散余弦變換(DCT)
(6)一般將第2到底13個倒譜系數(shù)保留作為短時語音信號的特征
Python實(shí)現(xiàn)
import
wave
import
numpy as np
import
math
import
matplotlib
.
pyplot as plt
from scipy
.
fftpack
import
dct
def
read
(
data_path
)
:
''
'讀取語音信號
''
'
wavepath
=
data_path
f
=
wave
.
open
(
wavepath
,
'rb'
)
params
=
f
.
getparams
(
)
nchannels
,
sampwidth
,
framerate
,
nframes
=
params
[
:
4
]
#聲道數(shù)、量化位數(shù)、采樣頻率、采樣點(diǎn)數(shù)
str_data
=
f
.
readframes
(
nframes
)
#讀取音頻,字符串格式
f
.
close
(
)
wavedata
=
np
.
fromstring
(
str_data
,
dtype
=
np
.
short
)
#將字符串轉(zhuǎn)化為浮點(diǎn)型數(shù)據(jù)
wavedata
=
wavedata
*
1.0
/
(
max
(
abs
(
wavedata
)
)
)
#wave幅值歸一化
return
wavedata
,
nframes
,
framerate
def
enframe
(
data
,
win
,
inc
)
:
''
'對語音數(shù)據(jù)進(jìn)行分幀處理
input
:
data
(
一維array
)
:
語音信號
wlen
(
int
)
:
滑動窗長
inc
(
int
)
:
窗口每次移動的長度
output
:
f
(
二維array
)
每次滑動窗內(nèi)的數(shù)據(jù)組成的二維array
''
'
nx
=
len
(
data
)
#語音信號的長度
try
:
nwin
=
len
(
win
)
except Exception as err
:
nwin
=
1
if
nwin
==
1
:
wlen
=
win
else
:
wlen
=
nwin
nf
=
int
(
np
.
fix
(
(
nx
-
wlen
)
/
inc
)
+
1
)
#窗口移動的次數(shù)
f
=
np
.
zeros
(
(
nf
,
wlen
)
)
#初始化二維數(shù)組
indf
=
[
inc
*
j
for
j in
range
(
nf
)
]
indf
=
(
np
.
mat
(
indf
)
)
.
T
inds
=
np
.
mat
(
range
(
wlen
)
)
indf_tile
=
np
.
tile
(
indf
,
wlen
)
inds_tile
=
np
.
tile
(
inds
,
(
nf
,
1
)
)
mix_tile
=
indf_tile
+
inds_tile
f
=
np
.
zeros
(
(
nf
,
wlen
)
)
for
i in
range
(
nf
)
:
for
j in
range
(
wlen
)
:
f
[
i
,
j
]
=
data
[
mix_tile
[
i
,
j
]
]
return
f
def
point_check
(
wavedata
,
win
,
inc
)
:
''
'語音信號端點(diǎn)檢測
input
:
wavedata
(
一維array
)
:原始語音信號
output
:
StartPoint
(
int
)
:
起始端點(diǎn)
EndPoint
(
int
)
:
終止端點(diǎn)
''
'
#
1.
計(jì)算短時過零率
FrameTemp1
=
enframe
(
wavedata
[
0
:
-
1
]
,
win
,
inc
)
FrameTemp2
=
enframe
(
wavedata
[
1
:
]
,
win
,
inc
)
signs
=
np
.
sign
(
np
.
multiply
(
FrameTemp1
,
FrameTemp2
)
)
# 計(jì)算每一位與其相鄰的數(shù)據(jù)是否異號,異號則過零
signs
=
list
(
map
(
lambda x
:
[
[
i
,
0
]
[
i
>
0
]
for
i in x
]
,
signs
)
)
signs
=
list
(
map
(
lambda x
:
[
[
i
,
1
]
[
i
<
0
]
for
i in x
]
,
signs
)
)
diffs
=
np
.
sign
(
abs
(
FrameTemp1
-
FrameTemp2
)
-
0.01
)
diffs
=
list
(
map
(
lambda x
:
[
[
i
,
0
]
[
i
<
0
]
for
i in x
]
,
diffs
)
)
zcr
=
list
(
(
np
.
multiply
(
signs
,
diffs
)
)
.
sum
(
axis
=
1
)
)
#
2.
計(jì)算短時能量
amp
=
list
(
(
abs
(
enframe
(
wavedata
,
win
,
inc
)
)
)
.
sum
(
axis
=
1
)
)
# # 設(shè)置門限
#
print
(
'設(shè)置門限'
)
ZcrLow
=
max
(
[
round
(
np
.
mean
(
zcr
)
*
0.1
)
,
3
]
)
#過零率低門限
ZcrHigh
=
max
(
[
round
(
max
(
zcr
)
*
0.1
)
,
5
]
)
#過零率高門限
AmpLow
=
min
(
[
min
(
amp
)
*
10
,
np
.
mean
(
amp
)
*
0.2
,
max
(
amp
)
*
0.1
]
)
#能量低門限
AmpHigh
=
max
(
[
min
(
amp
)
*
10
,
np
.
mean
(
amp
)
*
0.2
,
max
(
amp
)
*
0.1
]
)
#能量高門限
# 端點(diǎn)檢測
MaxSilence
=
8
#最長語音間隙時間
MinAudio
=
16
#最短語音時間
Status
=
0
#狀態(tài)
0
:
靜音段
,
1
:
過渡段
,
2
:
語音段
,
3
:
結(jié)束段
HoldTime
=
0
#語音持續(xù)時間
SilenceTime
=
0
#語音間隙時間
print
(
'開始端點(diǎn)檢測'
)
StartPoint
=
0
for
n in
range
(
len
(
zcr
)
)
:
if
Status
==
0
or Status
==
1
:
if
amp
[
n
]
>
AmpHigh or zcr
[
n
]
>
ZcrHigh
:
StartPoint
=
n
-
HoldTime
Status
=
2
HoldTime
=
HoldTime
+
1
SilenceTime
=
0
elif amp
[
n
]
>
AmpLow or zcr
[
n
]
>
ZcrLow
:
Status
=
1
HoldTime
=
HoldTime
+
1
else
:
Status
=
0
HoldTime
=
0
elif Status
==
2
:
if
amp
[
n
]
>
AmpLow or zcr
[
n
]
>
ZcrLow
:
HoldTime
=
HoldTime
+
1
else
:
SilenceTime
=
SilenceTime
+
1
if
SilenceTime
<
MaxSilence
:
HoldTime
=
HoldTime
+
1
elif
(
HoldTime
-
SilenceTime
)
<
MinAudio
:
Status
=
0
HoldTime
=
0
SilenceTime
=
0
else
:
Status
=
3
elif Status
==
3
:
break
if
Status
==
3
:
break
HoldTime
=
HoldTime
-
SilenceTime
EndPoint
=
StartPoint
+
HoldTime
return
FrameTemp1
[
StartPoint
:
EndPoint
]
def
mfcc
(
FrameK
,
framerate
,
win
)
:
''
'提取mfcc參數(shù)
input
:
FrameK
(
二維array
)
:
二維分幀語音信號
framerate
:
語音采樣頻率
win
:
分幀窗長(FFT點(diǎn)數(shù))
output
:
''
'
#mel濾波器
mel_bank
,
w2
=
mel_filter
(
24
,
win
,
framerate
,
0
,
0.5
)
FrameK
=
FrameK
.
T
#計(jì)算功率譜
S
=
abs
(
np
.
fft
.
fft
(
FrameK
,
axis
=
0
)
)
*
*
2
#將功率譜通過濾波器
P
=
np
.
dot
(
mel_bank
,
S
[
0
:
w2
,
:
]
)
#取對數(shù)
logP
=
np
.
log
(
P
)
#計(jì)算DCT系數(shù)
# rDCT
=
12
# cDCT
=
24
# dctcoef
=
[
]
#
for
i in
range
(
1
,
rDCT
+
1
)
:
# tmp
=
[
np
.
cos
(
(
2
*
j
+
1
)
*
i
*
math
.
pi
*
1.0
/
(
2.0
*
cDCT
)
)
for
j in
range
(
cDCT
)
]
# dctcoef
.
append
(
tmp
)
# #取對數(shù)后做余弦變換
# D
=
np
.
dot
(
dctcoef
,
logP
)
num_ceps
=
12
D
=
dct
(
logP
,
type
=
2
,
axis
=
0
,
norm
=
'ortho'
)
[
1
:
(
num_ceps
+
1
)
,
:
]
return
S
,
mel_bank
,
P
,
logP
,
D
def
mel_filter
(
M
,
N
,
fs
,
l
,
h
)
:
''
'mel濾波器
input
:
M
(
int
)
:濾波器個數(shù)
N
(
int
)
:FFT點(diǎn)數(shù)
fs
(
int
)
:采樣頻率
l
(
float
)
:低頻系數(shù)
h
(
float
)
:高頻系數(shù)
output
:
melbank
(
二維array
)
:
mel濾波器
''
'
fl
=
fs
*
l #濾波器范圍的最低頻率
fh
=
fs
*
h #濾波器范圍的最高頻率
bl
=
1125
*
np
.
log
(
1
+
fl
/
700
)
#將頻率轉(zhuǎn)換為mel頻率
bh
=
1125
*
np
.
log
(
1
+
fh
/
700
)
B
=
bh
-
bl #頻帶寬度
y
=
np
.
linspace
(
0
,
B
,
M
+
2
)
#將mel刻度等間距
print
(
'mel間隔'
,
y
)
Fb
=
700
*
(
np
.
exp
(
y
/
1125
)
-
1
)
#將mel變?yōu)镠Z
print
(
Fb
)
w2
=
int
(
N
/
2
+
1
)
df
=
fs
/
N
freq
=
[
]
#采樣頻率值
for
n in
range
(
0
,
w2
)
:
freqs
=
int
(
n
*
df
)
freq
.
append
(
freqs
)
melbank
=
np
.
zeros
(
(
M
,
w2
)
)
print
(
freq
)
for
k in
range
(
1
,
M
+
1
)
:
f1
=
Fb
[
k
-
1
]
f2
=
Fb
[
k
+
1
]
f0
=
Fb
[
k
]
n1
=
np
.
floor
(
f1
/
df
)
n2
=
np
.
floor
(
f2
/
df
)
n0
=
np
.
floor
(
f0
/
df
)
for
i in
range
(
1
,
w2
)
:
if
i
>=
n1 and i
<=
n0
:
melbank
[
k
-
1
,
i
]
=
(
i
-
n1
)
/
(
n0
-
n1
)
if
i
>=
n0 and i
<=
n2
:
melbank
[
k
-
1
,
i
]
=
(
n2
-
i
)
/
(
n2
-
n0
)
plt
.
plot
(
freq
,
melbank
[
k
-
1
,
:
]
)
plt
.
show
(
)
return
melbank
,
w2
if
__name__
==
'__main__'
:
data_path
=
'audio_data.wav'
win
=
256
inc
=
80
wavedata
,
nframes
,
framerate
=
read
(
data_path
)
FrameK
=
point_check
(
wavedata
,
win
,
inc
)
S
,
mel_bank
,
P
,
logP
,
D
=
mfcc
(
FrameK
,
framerate
,
win
)
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長非常感激您!手機(jī)微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
