リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

リケラボ 全国の大学リポジトリにある学位論文・教授論文を一括検索するならリケラボ論文検索大学・研究所にある論文を検索できる

大学・研究所にある論文を検索できる 「Pitch Extraction for Speech Signals in Noisy Environments」の論文概要。リケラボ論文検索は、全国の大学リポジトリにある学位論文・教授論文を一括検索できる論文検索サービスです。

コピーが完了しました

URLをコピーしました

論文の公開元へ論文の公開元へ
書き出し

Pitch Extraction for Speech Signals in Noisy Environments

RAHMAN MD. SAIFUR 埼玉大学 DOI:info:doi/10.24561/00019346

2020

概要

The pitch period is defined as the inverse of the fundamental frequency of the excitation source from the voiced speech signal. The pitch period (in short, pitch) or fundamental frequency is a prominent parameter of speech and highly applicable for speech-related systems such as speech coding, speech recognition, speech enhancement, speech synthesis and so on. The pitch and fundamental frequency so as to give the same meaning, while the pitch is inherently interpreted as the perception of the fundamental frequency. The pitch is generated from the vibration of the vocal cord causing periodicity in the speech signal.
Pitch extraction has proven to be a difficult task even for speech in a noise-free environment. The clean speech waveform is not really periodic; it is quasi-periodic and non-stationary. Although a large number of pitch extraction methods have been reported to deal with the noise-free environment. On the contrary, the least number of researchers attempt to extract the pitch in noisy environments. Under noisy environments, the periodic structure of the speech signal is destroyed so that the pitch extraction becomes an extremely complicated task. Therefore, the reliability and accuracy of the pitch extraction methods face real challenges in noisy environments.
From the above observations, the objective of this dissertation is to develop some approaches which are effective to handle the speech signals in the real application without any complicated post processing where speech signals are corrupted by noise. Some conventional state-of-the art approaches rely on a complicated post processing technique for pitch extraction. In this dissertation, we focus on simple and efficient approaches that are proposed and implemented to solve the factors that degrade the performance of pitch extraction methods.
In this dissertation, firstly, we propose the use of fourth-root spectrum instead of log spectrum for increasing the pitch extraction accuracy in noisy environments. To get clear harmonics, lifter and clipping operations are followed. When the resulting spectrum is transformed in the time domain by means of discrete Fourier transform, the pitch extraction is robust against narrow-band noise. When the above resulting spectrum is amplified by a power calculation and transformed in the time domain, the pitch extraction is robust against wide-band noise. These properties are investigated through exhaustive experiments in a variety of noise types. Computational time to be required is also studied. The experimental results based on above properties demonstrate the effectiveness of the new approaches for improving the performance of the pitch extraction. Also, the performance of this method sometimes deteriorates by the windowing effect. This method utilizes Hanning window function which does not better perform to extract pitch in the noisy environments.
To improve the performance of the extraction accuracy, the second approach considers an advancing trend of recent techniques for pitch extraction of speech in noisy environments, windowing effects are discussed analytically, and it is insisted that the Rectangular window should be proactively used instead of the popular Hanning or Hamming window. In a variety of noise environments, a performance comparison of the conventional pitch extraction methods is conducted, and as a result, we take a standpoint to support the autocorrelation (ACF) method. Incorporating accumulation techniques, three types of pitch extraction approaches are developed. Through experiments, it is shown that the proposed approaches have the potential to provide better performance for pitch extraction without relying on a complicated post processing technique.

この論文で使われている画像

参考文献

Journal Articles

1. Md. Saifur Rahman, Yosuke Sugiura and Tetsuya Shimamura, “Pitch Extraction using Fourth-root Spectrum in Noisy speech”, Journal of Signal Processing,

17-pages (Accepted), 2020.

2. Md. Saifur Rahman, Yosuke Sugiura and Tetsuya Shimamura, “Utilization of

Windowing Effect and Accumulated Autocorrelation Function and Power Spectrum for Pitch Detection in Noisy Environments”, IEEJ Trans. Electrical and

Electronic Engineering, 9-pages (Accepted), 2020.

International Conference (Reviewed)

1. Md. Saifur Rahman, Yosuke Sugiura and Tetsuya Shimamura, “A Multiple

Functions Multiplication Approach for Pitch Extraction of Noisy Speech”, Int.,

Conf., on Speech Tech., and Human-Com., Dial., (SpeD), Oct. 2019.

2. Md. Saifur Rahman, Yosuke Sugiura and Tetsuya Shimamura, “Refined Autocorrelation Function for Pitch Detection of Speech”, Int. Workshop on Smart

Info-Media Systems in Asia (SISA 2019), pp. 72-77, Sep. 2019.

3. Md. Saifur Rahman, Yosuke Sugiura and Tetsuya Shimamura, “Pitch Determination of Noisy Speech Using Cumulant Based Modified Weighted Function ”,

Proc. IEEE TENCON, pp. 1474-1478, Oct. 2018.

...

参考文献をもっと見る