Publisher's Synopsis
Understanding how humans produce a speech is considered one of the most challenging tasks due to the sophisticated mechanism involved in speech production . On top of this, understanding how a particular speaker is producing speech and mimicking his/her voice will be all the more difficult research problem. Fewer attempts have been made in the past to actually understand how the mimicry is performed from speech production viewpoint . Voice Conversion (VC) is a technique that modifies the perceived speaker identity in a given speech utterance from a source speaker to a particular target speaker without changing the linguistic content of the utterance . Basically, it can be considered as a speaker conversion technique. In particular, the goal of the VC technique is to mimic the given target speaker similar to professional human mimicry.Trying to produce speech like someone else, either by mimicking, or by synthesizing, or by a vice conversion is a challenging task, as it requires understanding of speaker-specific characteristics in speech. The speaker-specific charesteristics vary with language, contet and environment. Moreover, the acceptibility of the resulting speech depends on the perception of the listener, which in turn depends on his/er background. VC techniques come under the broad area of Voice Transformation (VT). VT can be considered as any non-linguistic modifications one may apply to the speech signal . For example, time-scaling, pitch-scaling, voice-individuality control, speaker identity conversion, etc. An excellent survey article for the VT can be found . Unlike speaker recognition/verification, the objective in the VC is not just to identify or detect the speaker-dependent signatures in the speech signal, rather modify these speaker-dependent signatures from one speaker to the another, and to generate high quality converted speech at the end. Hence, VC can be considered as one of the most difficult research issues among all the possible VT techniques.