Bengali character recognition using feature extraction

Citation

Abstract

The Character Recognition Problem can be assumed as a classification task in which a (portion of an) image is to be given a label among a set of possible labels that represent the characters under consideration. This is the fundamental aspect of feature extraction technique .This generic formulation may lead to quite different settings. Also, if the images of the characters can be obtained optically, we speak of “Optical Character Recognition” (OCR), as opposed to other settings in which input data is obtained by other means. OCR itself can be considered as a subtask of the more general problem of “Document Analysis or Understanding”, where the goal is to obtain a symbolic representation of a digital image of the document under consideration that include not only the recognized text (characters), but also other document components and their relationship. In this thesis I will discuss various feature extraction techniques and later I will see how zoning can be used to build an efficient Bengali character recognition system. Different feature extraction techniques are used to recognize different representations of characters for example binary characters, character contours, skeletons (thinned characters) or gray level sub images of each individual character. The feature extraction methods are distinguished in terms of invariance properties, re-constructability and expected distortions and variability of characters. When a feature extraction method is chosen we need to consider it in terms of efficient application of the system and time consideration for building such system.

Description

Cataloged from PDF version of thesis report.
Includes bibliographical references (page 25).
This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2007.

Publisher Link

Type

Thesis