Automatic subtitle generation for Bengali multimedia using deep learning

Rhythm, Ehsanur Rahman; Arnob, Shafakat Sowroar; Shuvo, Rajvir Ahmed

Automatic subtitle generation for Bengali multimedia using deep learning

Files

22241163_20101129_20141003_CSE.pdf (3.42 MB)

Date

2023-09

Publisher

Brac University

Authors

Rhythm, Ehsanur Rahman

Arnob, Shafakat Sowroar

Shuvo, Rajvir Ahmed

Full item page

URI

http://hdl.handle.net/10361/23554

Abstract

For audio or video material to be more inclusive and accessible, automatic subtitle generation is essential. Nevertheless, implementing this technology into Bengali presents significant challenges due to scarce resources and linguistic difficulty. In this study, a new deep learning based system for creating Subtitles for Bengali multimedia automatically is introduced. The suggested approach makes use of the Wav2vec2 and the Common Voice Bengali Dataset, a large collection of Bengali audio recordings. This study uses the Common Voice Dataset Bengali to train and tune the Wav2vec2 model in order to accurately convert Bengali audio into text. Current automatic speech recognition approaches are combined with Bengali language-specific factors in the created system to give accurate and reliable transcription works. The transcribed text is synced with the matching audio parts throughout the subtitle production process. The produced subtitles are enhanced using post-processing approaches, similar to capitalization and punctuation restoration, to ensure readability and consistency. The findings of this study might greatly improve Bengali language media’s usability and availability across a range of sectors. The created subtitles may enhance the watching experience for Bengali multimedia by easing greater understanding, and expanding availability. The study demonstrates the potential of using deep learning and ASR methods to get over the difficulties of automated subtitle production in the Bengali language, advancing multimedia availability and inclusion.

Keywords

Automatic subtitle generation, Bengali audio, Deep learning, Natural language processing

LC Subject Headings

Natural language processing (Computer science), Computational linguistics, Data mining

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 51-53).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.

Department

Department of Computer Science and Engineering

Type

Thesis

Collections

Thesis (Bachelor of Science in Computer Science)

Automatic subtitle generation for Bengali multimedia using deep learning

Files

Date

Publisher

Authors

URI

Citation

Abstract

Keywords

LC Subject Headings

Description

Publisher Link

Department

Type

Collections