Pitch Extraction for Speech Signals in Noisy Environments
概要
The pitch period is defined as the inverse of the fundamental frequency of the excitation source from the voiced speech signal. The pitch period (in short, pitch) or fundamental frequency is a prominent parameter of speech and highly applicable for speech-related systems such as speech coding, speech recognition, speech enhancement, speech synthesis and so on. The pitch and fundamental frequency so as to give the same meaning, while the pitch is inherently interpreted as the perception of the fundamental frequency. The pitch is generated from the vibration of the vocal cord causing periodicity in the speech signal.
Pitch extraction has proven to be a difficult task even for speech in a noise-free environment. The clean speech waveform is not really periodic; it is quasi-periodic and non-stationary. Although a large number of pitch extraction methods have been reported to deal with the noise-free environment. On the contrary, the least number of researchers attempt to extract the pitch in noisy environments. Under noisy environments, the periodic structure of the speech signal is destroyed so that the pitch extraction becomes an extremely complicated task. Therefore, the reliability and accuracy of the pitch extraction methods face real challenges in noisy environments.
From the above observations, the objective of this dissertation is to develop some approaches which are effective to handle the speech signals in the real application without any complicated post processing where speech signals are corrupted by noise. Some conventional state-of-the art approaches rely on a complicated post processing technique for pitch extraction. In this dissertation, we focus on simple and efficient approaches that are proposed and implemented to solve the factors that degrade the performance of pitch extraction methods.
In this dissertation, firstly, we propose the use of fourth-root spectrum instead of log spectrum for increasing the pitch extraction accuracy in noisy environments. To get clear harmonics, lifter and clipping operations are followed. When the resulting spectrum is transformed in the time domain by means of discrete Fourier transform, the pitch extraction is robust against narrow-band noise. When the above resulting spectrum is amplified by a power calculation and transformed in the time domain, the pitch extraction is robust against wide-band noise. These properties are investigated through exhaustive experiments in a variety of noise types. Computational time to be required is also studied. The experimental results based on above properties demonstrate the effectiveness of the new approaches for improving the performance of the pitch extraction. Also, the performance of this method sometimes deteriorates by the windowing effect. This method utilizes Hanning window function which does not better perform to extract pitch in the noisy environments.
To improve the performance of the extraction accuracy, the second approach considers an advancing trend of recent techniques for pitch extraction of speech in noisy environments, windowing effects are discussed analytically, and it is insisted that the Rectangular window should be proactively used instead of the popular Hanning or Hamming window. In a variety of noise environments, a performance comparison of the conventional pitch extraction methods is conducted, and as a result, we take a standpoint to support the autocorrelation (ACF) method. Incorporating accumulation techniques, three types of pitch extraction approaches are developed. Through experiments, it is shown that the proposed approaches have the potential to provide better performance for pitch extraction without relying on a complicated post processing technique.