Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/9233
Title: | Morphology based prototype statistical machine translation system for English to Tamil language |
Researcher: | Anand Kumar M |
Guide(s): | Soman, K P |
Keywords: | Computer Science Networking Tamil language Language translation Speech Tagger |
Upload Date: | 28-May-2013 |
University: | Amrita Vishwa Vidyapeetham (University) |
Completed Date: | 2013 |
Abstract: | Machine translation is about automatic translation of one natural language text to another using computer. In this thesis, morphology based Factored Statistical Machine Translation system (F-SMT) is proposed for translating sentence from English to Tamil. Tamil linguistic tools such as Part-of-Speech Tagger, Morphological Analyzer and Morphological Generator are also developed as a part of this research work. Conventionally, rule-based approaches are employed for developing Machine Translation. It uses transfer-rules between the source language and the target language for producing grammatical translations. The major drawback of this approach is that it always requires the help of a good linguist for the rule improvement. So, recently datadriven approaches such as example-based and statistical based systems are getting more attention from research community. Currently, Statistical Machine Translation (SMT) systems are playing a major role in developing translation between languages. The main advantage of using Statistical Machine Translation system is that it is language independent and it disambiguates the sense automatically with the use of large quantities of parallel corpora. SMT system considers the translation problem as a machine learning problem. Statistical learning methods perform translation based on large amounts of parallel training data. At first, non-structural information and statistical parameters are derived from the bi-lingual corpora. These statistical parameters are then used for translation. Baseline Statistical Machine Translation system considers only surface forms and does not use linguistic knowledge of the languages. Therefore its performance is better for similar language pair when compared to the dissimilar language pair. Translating English into morphologically rich languages is a challenging task. Because of the highly rich morphological nature of Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morphological. |
Pagination: | xxiv, 310p. |
URI: | http://hdl.handle.net/10603/9233 |
Appears in Departments: | Amrita School of Engineering |
Files in This Item:
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: