Automatic phonetic segmentation of Malay speech database

Chee Ming Ting, Sh Hussain Salleh, Tian Swee Tan, A. K. Ariff

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)


This paper deals with automatic phonetic segmentation for Malay continuous speech. This study investigates fast and automatic phone segmentation in preparing database for Malay concatenative Text-to-Speech (TTS) systems. A 35 Malay phone set has been chosen, which is suitable for building Malay TTS. The segmentation experiment is based on this phone set. HMM based segmentation approach which uses Viterbi force alignment technique is adapted. We use continuous density HMM (CDHMM) with Gaussian mixture which is performs well in speech recognition to prevent large segmentation errors. Besides, this paper presents an implicit boundary refinement method that is incorporated in the Viterbi phonetic alignment. In this approach, the HMM model is trained with phone tokens with their boundaries extended to the be-side phones. This increases the ability of the HMM in modeling phone boundaries and provides effect of implicit boundary refinement when used in phonetic alignment thus reduce segmentation errors. This approach improves increase the performance of baseline HMM segmentation from 42.39%, 74.83%, 84.34% of automatic boundary marks within error smaller than 5, 15, and 25ms to 47.75%, 76.38%, 85.55%.

Original languageEnglish
Title of host publication2007 6th International Conference on Information, Communications and Signal Processing, ICICS
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISBN (Print)1424409837, 9781424409839
Publication statusPublished - 2007
Externally publishedYes
EventInternational Conference on Information, Communications and Signal Processing 2007 - Singapore, Singapore, Singapore
Duration: 1 Jan 200713 Dec 2007
Conference number: 6th


ConferenceInternational Conference on Information, Communications and Signal Processing 2007
Abbreviated titleICICS 2007


  • Speech recognition
  • Speech synthesis

Cite this