Real time bengali speech to text conversion using CMU sphinx
Loading...
Date
Publisher
BRAC University
Citation
Abstract
This paper aims to demonstrate the use of Speech-to-Text technology to convert Bangla spoken in a natural and continuous state into Bengali UNICODE font with good accuracy. This achievement required the usage of the open sourced framework Sphinx 4 created by Carnegie Melon University (CMU) which was written in Java and provides the required procedural coding tools to develop an acoustic model for a custom language like Bangla. It takes help of algorithms like Baum-Welch to create an Acoustic Model from training data which we gathered ourselves. Our main objective was to ensure that the system was adequately trained on a word by word basis from various speakers so that it could recognize new speakers fluently. We used a free digital audio workstation (DAW) called Audacity to manipulate the collected recording data via techniques like continuous frequency profiling to reduce the Signal-to-Noise-Ratio (SNR), vocal levelling, normalization and syllable splitting as well as merging to ensure an error free 1:1-word mapping of each utterance with its mirror transcription file text. The result is a speech to text recognition system with an acceptable accuracy of around 75% that was trained using recorded speech data from 10 individual speakers consisting of both males and females using custom transcript files that we wrote.
Description
Cataloged from PDF version of thesis report.
Includes bibliographical references (pages 35-37).
This thesis report is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.
Includes bibliographical references (pages 35-37).
This thesis report is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.
Publisher Link
Type
Thesis