Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/9233
Title: Morphology based prototype statistical machine translation system for English to Tamil language
Researcher: Anand Kumar M
Guide(s): Soman, K P
Keywords: Computer Science
Networking
Tamil language
Language translation
Speech Tagger
Upload Date: 28-May-2013
University: Amrita Vishwa Vidyapeetham (University)
Completed Date: 2013
Abstract: Machine translation is about automatic translation of one natural language text to another using computer. In this thesis, morphology based Factored Statistical Machine Translation system (F-SMT) is proposed for translating sentence from English to Tamil. Tamil linguistic tools such as Part-of-Speech Tagger, Morphological Analyzer and Morphological Generator are also developed as a part of this research work. Conventionally, rule-based approaches are employed for developing Machine Translation. It uses transfer-rules between the source language and the target language for producing grammatical translations. The major drawback of this approach is that it always requires the help of a good linguist for the rule improvement. So, recently datadriven approaches such as example-based and statistical based systems are getting more attention from research community. Currently, Statistical Machine Translation (SMT) systems are playing a major role in developing translation between languages. The main advantage of using Statistical Machine Translation system is that it is language independent and it disambiguates the sense automatically with the use of large quantities of parallel corpora. SMT system considers the translation problem as a machine learning problem. Statistical learning methods perform translation based on large amounts of parallel training data. At first, non-structural information and statistical parameters are derived from the bi-lingual corpora. These statistical parameters are then used for translation. Baseline Statistical Machine Translation system considers only surface forms and does not use linguistic knowledge of the languages. Therefore its performance is better for similar language pair when compared to the dissimilar language pair. Translating English into morphologically rich languages is a challenging task. Because of the highly rich morphological nature of Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morphological.
Pagination: xxiv, 310p.
URI: http://hdl.handle.net/10603/9233
Appears in Departments:Amrita School of Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File37.82 kBAdobe PDFView/Open
02_certificate.pdf38.31 kBAdobe PDFView/Open
03_declaration.pdf25.67 kBAdobe PDFView/Open
04_table of contents.pdf69.74 kBAdobe PDFView/Open
05_acknowledgements.pdf59.79 kBAdobe PDFView/Open
06_list of figures.pdf57.14 kBAdobe PDFView/Open
07_list of tables.pdf54.43 kBAdobe PDFView/Open
08_list of abbreviations.pdf56.82 kBAdobe PDFView/Open
09_abstract.pdf68.37 kBAdobe PDFView/Open
10_chapter 1.pdf401.52 kBAdobe PDFView/Open
11_chapter 2.pdf212.42 kBAdobe PDFView/Open
12_chapter 3.pdf2.05 MBAdobe PDFView/Open
13_chapter 4.pdf365.2 kBAdobe PDFView/Open
14_chapter 5.pdf464.79 kBAdobe PDFView/Open
15_chapter 6.pdf587.49 kBAdobe PDFView/Open
16_chapter 7.pdf902.64 kBAdobe PDFView/Open
17_chapter 8.pdf531.49 kBAdobe PDFView/Open
18_chapter 9.pdf447.04 kBAdobe PDFView/Open
19_chapter 10.pdf78.75 kBAdobe PDFView/Open
20_appendix.pdf696.76 kBAdobe PDFView/Open
21_references.pdf179.07 kBAdobe PDFView/Open
22_synopsis.pdf276.23 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: