Morphology based prototype statistical machine translation system for English to Tamil language

Anand Kumar M

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/9233

Title:	Morphology based prototype statistical machine translation system for English to Tamil language
Researcher:	Anand Kumar M
Guide(s):	Soman, K P
Keywords:	Computer Science Networking Tamil language Language translation Speech Tagger
Upload Date:	28-May-2013
University:	Amrita Vishwa Vidyapeetham (University)
Completed Date:	2013
Abstract:	Machine translation is about automatic translation of one natural language text to another using computer. In this thesis, morphology based Factored Statistical Machine Translation system (F-SMT) is proposed for translating sentence from English to Tamil. Tamil linguistic tools such as Part-of-Speech Tagger, Morphological Analyzer and Morphological Generator are also developed as a part of this research work. Conventionally, rule-based approaches are employed for developing Machine Translation. It uses transfer-rules between the source language and the target language for producing grammatical translations. The major drawback of this approach is that it always requires the help of a good linguist for the rule improvement. So, recently datadriven approaches such as example-based and statistical based systems are getting more attention from research community. Currently, Statistical Machine Translation (SMT) systems are playing a major role in developing translation between languages. The main advantage of using Statistical Machine Translation system is that it is language independent and it disambiguates the sense automatically with the use of large quantities of parallel corpora. SMT system considers the translation problem as a machine learning problem. Statistical learning methods perform translation based on large amounts of parallel training data. At first, non-structural information and statistical parameters are derived from the bi-lingual corpora. These statistical parameters are then used for translation. Baseline Statistical Machine Translation system considers only surface forms and does not use linguistic knowledge of the languages. Therefore its performance is better for similar language pair when compared to the dissimilar language pair. Translating English into morphologically rich languages is a challenging task. Because of the highly rich morphological nature of Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morphological.
Pagination:	xxiv, 310p.
URI:	http://hdl.handle.net/10603/9233
Appears in Departments:	Amrita School of Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	37.82 kB	Adobe PDF	View/Open
02_certificate.pdf		38.31 kB	Adobe PDF	View/Open
03_declaration.pdf		25.67 kB	Adobe PDF	View/Open
04_table of contents.pdf		69.74 kB	Adobe PDF	View/Open
05_acknowledgements.pdf		59.79 kB	Adobe PDF	View/Open
06_list of figures.pdf		57.14 kB	Adobe PDF	View/Open
07_list of tables.pdf		54.43 kB	Adobe PDF	View/Open
08_list of abbreviations.pdf		56.82 kB	Adobe PDF	View/Open
09_abstract.pdf		68.37 kB	Adobe PDF	View/Open
10_chapter 1.pdf		401.52 kB	Adobe PDF	View/Open
11_chapter 2.pdf		212.42 kB	Adobe PDF	View/Open
12_chapter 3.pdf		2.05 MB	Adobe PDF	View/Open
13_chapter 4.pdf		365.2 kB	Adobe PDF	View/Open
14_chapter 5.pdf		464.79 kB	Adobe PDF	View/Open
15_chapter 6.pdf		587.49 kB	Adobe PDF	View/Open
16_chapter 7.pdf		902.64 kB	Adobe PDF	View/Open
17_chapter 8.pdf		531.49 kB	Adobe PDF	View/Open
18_chapter 9.pdf		447.04 kB	Adobe PDF	View/Open
19_chapter 10.pdf		78.75 kB	Adobe PDF	View/Open
20_appendix.pdf		696.76 kB	Adobe PDF	View/Open
21_references.pdf		179.07 kB	Adobe PDF	View/Open
22_synopsis.pdf		276.23 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET