Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/537432
Title: A System for Simplification of Idiomatic Gujarati Text for Improved Interlingual Language Processing
Researcher: Modh, Jatinkumar Chamanlal
Guide(s): Saini, Jatinderkumar R.
Keywords: Computer Science
Computer Science Theory and Methods
Engineering and Technology
University: Gujarat Technological University
Completed Date: 2023
Abstract: quotAll existing Gujarati machine translation systems, including Microsoft Translator and Google Translate face the problem with the idiomatic Gujarati text. They are unable to properly translate Gujarati idioms. The proposed system simplifies the Gujarati idioms by correctly recognizing all Gujarati idiom phrases present in the input and replacing them with the corresponding Gujarati meanings of the idiom. As part of interlingual processing, this model translates all Gujarati idioms into the same Gujarati language but with simplified Gujarati meaning. The text is to be translated into the same language, but in a simplified form, in interlingual language processing. The result provided by the proposed system is simplified Gujarati text that does not contain any Gujarati idioms. The purpose of the research is to aid in the translation of Gujarati idioms into any language in the world by simplifying idioms. newlineOverall 3472 distinct and 6081 non-distinct Gujarati idioms are collected, analyzed and classified. This research work classifies the Gujarati idioms into N-gram, M-meaning, root idioms, inflected idioms and personage idioms. Because Gujarati idioms are used in a variety of formats and in a variety of contexts in real life, recognizing them all can be a difficult job for any machine translation system. The proposed system detects all inflected and static idiom formats from the Gujarati text by employing a dictionary-based approach and a rule-based approach to generate dynamic idiom forms. Dynamic generation and detection of idioms are possible using the newly generated 15 suffix and diacritics-based rules. In the case of multiple meaning idioms, a context-based search algorithm determines the particular meaning of the idiom using surrounding contextual words. newlineIn addition, the readability complexity prediction model calculates the readability complexity score and predicts the complexity type for the idiomatic Gujarati text by considering four different parameters. This is innovative and the first in the Gujarati l
Pagination: xx, 108p
URI: http://hdl.handle.net/10603/537432
Appears in Departments:Computer Science

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File16.47 kBAdobe PDFView/Open
03_abstract.pdf7.57 kBAdobe PDFView/Open
06_contents.pdf611.35 kBAdobe PDFView/Open
10_chapter1.pdf853.77 kBAdobe PDFView/Open
11_chapter2.pdf227.95 kBAdobe PDFView/Open
12_chapter3.pdf2.91 MBAdobe PDFView/Open
13_chapter4.pdf1.56 MBAdobe PDFView/Open
14_chapter5.pdf117.52 kBAdobe PDFView/Open
80_recommendation.pdf290.66 kBAdobe PDFView/Open
prelim pages.pdf1.89 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: