Segmentation Of Two Touching Handwritten Arabic Characters Using Overlapping Set Theory And Gradient Orientation

Ullah, Inam (2020) Segmentation Of Two Touching Handwritten Arabic Characters Using Overlapping Set Theory And Gradient Orientation. Doctoral thesis, Universiti Teknikal Malaysia Melaka.

[img] Text (Full Text)
Segmentation Of Two Touching Handwritten Arabic Characters Using Overlapping Set Theory And Gradient Orientation.pdf - Submitted Version
Restricted to Registered users only

Download (5MB)
[img] Text (24 Pages)
Segmentation Of Two Touching Handwritten Arabic Characters Using Overlapping Set Theory And Gradient Orientation.pdf - Submitted Version

Download (613kB)

Abstract

Image segmentation of offline Arabic handwritten documents is an active research area but requires efforts to segment image into regions compared to human vision, especially for degraded handwritten historical documents. Therefore, these valuable degraded handwritten documents attract researchers from all around the world but facing problems in segmentation of Arabic text because of overlapping and touching character. The overlapping and touching of character occurs by not following the standard rule of writing where, two or more characters share the same space and these touching characters are considered as one sub-word. At present many techniques are available for touching handwritten character segmentation by using the concept of connected components. These methods are easy to implement and provide high accuracy in some cases but they fail in many cases because some manual decision value is required to determine the correct segmentation path near junction point, which produce unstable character boundary. Besides, these methods are unstable when applied to handwritten characters having loops or circular path in both touching characters. In this case, the cut-point is located in incorrect place, which can lead to incorrect dividing path of a character boundary. The selection of path near junction point is one of the main challenge in segmentation of connected components. Currently, these methods contain many disadvantages usually implemented for only one layout and fonts types because of variation in writing. Apart from connected components methods, template based segmentation is another available method where several studies have been developed based on template creation for touching characters. The disadvantage is creating many templates for all possible touching types. Therefore, due to variation in writing connected components methods still unexplored especially for the cursive based handwriting like Arabic and Jawi. In this work, three objectives are highlighted, first is to identify junction point of touching image, second is to formulate direction near junction point and third is for segmentation of touching characters. The research methodology consists of three proposed ideas: junction point detection, formulate direction and segmentation stage. In junction point identification stage overlapping set theory is used to identify the segmentation point of the two touching characters. In formulate direction stage; gradient technique is used to formulate the right direction near junction point. In segmentation stage contour tracing technique is used to segment the two touching character into isolated characters. The three proposed methods were tested on IFN/ENIT, AHDB and IAM datasets. Experiments were conducted on finding of junction point where success rate is 93.3%, for the second proposed method, the success rate is 98% and last proposed segmentation method is 97.27%. In conclusion, the proposed segmentation method outperforms the existing research in term of accuracy. Proposed methods do not use any recognizer or template to control segmentation accuracy. Finally, the proposed segmentation method was again compared with state of the art methods, and it also gained better accuracy rate for degraded, non-degraded document images and the accuracy for the overall processes for AHDB is about 97.45% and 85.03% for IAM dataset.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Character sets (Data processing, Arabic character sets (Data processing), Touching Handwritten, Arabic Characters
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics
Divisions: Library > Tesis > FTMK
Depositing User: F Haslinda Harun
Date Deposited: 17 Nov 2021 08:46
Last Modified: 17 Nov 2021 08:46
URI: http://eprints.utem.edu.my/id/eprint/25386
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item