A System for Simplification of Idiomatic Gujarati Text for Improved Interlingual Language Processing

Modh, Jatinkumar Chamanlal

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/537432

Title:	A System for Simplification of Idiomatic Gujarati Text for Improved Interlingual Language Processing
Researcher:	Modh, Jatinkumar Chamanlal
Guide(s):	Saini, Jatinderkumar R.
Keywords:	Computer Science Computer Science Theory and Methods Engineering and Technology
University:	Gujarat Technological University
Completed Date:	2023
Abstract:	quotAll existing Gujarati machine translation systems, including Microsoft Translator and Google Translate face the problem with the idiomatic Gujarati text. They are unable to properly translate Gujarati idioms. The proposed system simplifies the Gujarati idioms by correctly recognizing all Gujarati idiom phrases present in the input and replacing them with the corresponding Gujarati meanings of the idiom. As part of interlingual processing, this model translates all Gujarati idioms into the same Gujarati language but with simplified Gujarati meaning. The text is to be translated into the same language, but in a simplified form, in interlingual language processing. The result provided by the proposed system is simplified Gujarati text that does not contain any Gujarati idioms. The purpose of the research is to aid in the translation of Gujarati idioms into any language in the world by simplifying idioms. newlineOverall 3472 distinct and 6081 non-distinct Gujarati idioms are collected, analyzed and classified. This research work classifies the Gujarati idioms into N-gram, M-meaning, root idioms, inflected idioms and personage idioms. Because Gujarati idioms are used in a variety of formats and in a variety of contexts in real life, recognizing them all can be a difficult job for any machine translation system. The proposed system detects all inflected and static idiom formats from the Gujarati text by employing a dictionary-based approach and a rule-based approach to generate dynamic idiom forms. Dynamic generation and detection of idioms are possible using the newly generated 15 suffix and diacritics-based rules. In the case of multiple meaning idioms, a context-based search algorithm determines the particular meaning of the idiom using surrounding contextual words. newlineIn addition, the readability complexity prediction model calculates the readability complexity score and predicts the complexity type for the idiomatic Gujarati text by considering four different parameters. This is innovative and the first in the Gujarati l
Pagination:	xx, 108p
URI:	http://hdl.handle.net/10603/537432
Appears in Departments:	Computer Science

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	16.47 kB	Adobe PDF	View/Open
03_abstract.pdf		7.57 kB	Adobe PDF	View/Open
06_contents.pdf		611.35 kB	Adobe PDF	View/Open
10_chapter1.pdf		853.77 kB	Adobe PDF	View/Open
11_chapter2.pdf		227.95 kB	Adobe PDF	View/Open
12_chapter3.pdf		2.91 MB	Adobe PDF	View/Open
13_chapter4.pdf		1.56 MB	Adobe PDF	View/Open
14_chapter5.pdf		117.52 kB	Adobe PDF	View/Open
80_recommendation.pdf		290.66 kB	Adobe PDF	View/Open
prelim pages.pdf		1.89 MB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET