A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition

Ngo, Hea Choon and Hashim, Ummi Rabaah and Raja Ikram, Raja Rina and Salahuddin, Lizawati and Teoh, Mok Lee (2020) A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition. International Journal of Advanced Trends in Computer Science and Engineering, 9 (4). pp. 4589-4596. ISSN 2278-3091

[img] Text
2020, NGO, AUDIO-VISUAL SPEECH - IJATCSE_01.PDF

Download (421kB)

Abstract

Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find out. As you can imagine, this technique affects the brain in different ways and becomes exhausting over a period of time. If a person who is deaf, uses language and is able to read lips, hearing people may not understand the challenges they are facing just to have a simple one-on-one conversation. The hearing person may be annoyed that they are often asked to repeat themselves or to speak more slowly and clearly. They could lose patience and break off the conversation. In our modern world, where technology connects us in a way never thought possible, there are a variety of ways to communicate with another person. Deaf people come from all walks of life and with different backgrounds. In this study, a lipreading model is being developed that is able to record, analyze, translate the movement of lips and display them into subtitles. A model is trained with GRID Corpus, MIRACL-VC1 and pre-trained dataset and with the LipNet model to build a system which deaf people can decode text from the movement of a speaker’s mouth. This system will help the deaf people understand what others are actually saying and communicate more effectively. As a conclusion, this system helps deaf people to communicate effectively with others.

Item Type: Article
Uncontrolled Keywords: Conversation, Facial features, Lipreading model, Audio-Visual Speech Recognition
Divisions: Faculty of Information and Communication Technology
Depositing User: Sabariah Ismail
Date Deposited: 20 Apr 2021 12:28
Last Modified: 20 Apr 2021 12:28
URI: http://eprints.utem.edu.my/id/eprint/25010
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item