Ispeech features

8/3/2023

The fully connected structure of multi-layer neuron nodes and the application of nonlinear activation functions enables the deep learning to solve various classification and regression models for the separation of the speech and the noise. In recent years, deep learning became increasingly popular as a mapping method between the noisy and clean speech signals to accomplish the task of enhancing a desired speech signal. However, it is difficult to accurately estimate different types of noise with nonlinear or non-stationary features. The performance of conventional methods is generally dependent on the nature of the background noise and the statistical properties of speech, because traditional methods need to estimate power spectrum of noise. There are two problems to solve for the traditional HMM, which are the limitation of conditional independence and difficulty of processing segmental features. HMMs are doubly stochastic processes or probabilistic functions of Markov chains that model time-series data as the evolution of a hidden state variable through a discrete set of possible values. The Wiener filter is a linear estimator and minimizes the mean-squared error between the original and enhanced speech, which depends on the filter transfer function from sample to sample based on the speech signal statistics. Spectral subtraction is one of the typical speech enhancement algorithms proposed to remove environment noise, but the resulting enhanced speech often suffers from annoying musical artifact called musical noise. Recently, the classic noise reduction methods including spectral subtraction (SS), Wiener filtering (WF), hidden Markov model (HMM) and statistical model-based algorithms have been widely studied to remove or attenuate additive noise from noisy speeches. The purpose of speech enhancement is to improve speech quality and intelligibility under the interfering noise conditions.

In the past several decades, speech enhancement has attracted considerable research interest due to the wide application of voice-based solutions for real-world applications. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility.

To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. Speech is easily interfered by external environment in reality, which results in the loss of important features.

0 Comments

Ispeech features

Leave a Reply.

Author

Archives

Categories