Development and evaluation of hybrid machine translation systems for english to indian language under low resource conditions

Mrinalini Kannan

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/476964

Title:	Development and evaluation of hybrid machine translation systems for english to indian language under low resource conditions
Researcher:	Mrinalini Kannan
Guide(s):	Vijayalakshmi P
Keywords:	Bilingual Language Machine Translation Parts-of-Speech
University:	Anna University
Completed Date:	2022
Abstract:	India is a multi-cultural and multilingual country with 22 official newlinelanguages belonging to different linguistic families. Since the time of British newlinecolonial rule in India, English has been used as the linguistic medium (L1 newlinelanguage) for administrative and higher education purposes. Postindependence, newlinethe use of regional languages (as L1 language) along with newlineEnglish (as L2 language) has been encouraged in the states across the country. newlineHowever, the usage of either L1 or L2 language varies among the common newlinepeople. Thus, it is essential to develop machine translation (MT) systems newlinefrom English-to-Indian languages for smoother transactions and newlinecommunication across the country. Among the seven linguistic families of newlineSouth Asia, Indo-Aryan and Dravidian languages account for over 90% of newlineIndian speakers. On this note, the current research work proposes to develop newlinean efficient statistical-based (SMT) and neural-based (NMT) machine newlinetranslation systems for translation from English to two Indian languages newlinenamely, Tamil (a Dravidian language) and Hindi (an Indo-Aryan language). newlineMost of the well-established data-driven approaches for developing newlineMT systems require huge amount of parallel text in the source and target newlinelanguage, to train an efficient translation model. Availability of such huge newlineparallel corpora between English and Indian languages is scarce. Further, newlinedomain-specific parallel corpora required to develop highly efficient and newlineapplication-oriented MT systems are also not available. The proposed work newlinemakes use of punctuation marks and re-ordering to augment the parallel data newlineavailable for training the SMT and NMT systems. newline
Pagination:	xix,183p.
URI:	http://hdl.handle.net/10603/476964
Appears in Departments:	Faculty of Information and Communication Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	22.31 kB	Adobe PDF	View/Open
02_prelim pages.pdf		2.32 MB	Adobe PDF	View/Open
03_contents.pdf		126.87 kB	Adobe PDF	View/Open
04_abstracts.pdf		84.37 kB	Adobe PDF	View/Open
05_chapter1.pdf		385.33 kB	Adobe PDF	View/Open
06_chapter2.pdf		429.65 kB	Adobe PDF	View/Open
07_chapter3.pdf		2.27 MB	Adobe PDF	View/Open
08_chapter4.pdf		1.45 MB	Adobe PDF	View/Open
09_chapter5.pdf		563.12 kB	Adobe PDF	View/Open
10_annexures.pdf		111.21 kB	Adobe PDF	View/Open
80_recommendation.pdf		95.19 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET