Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/373529
Title: Hybrid Approach Based Lexical and Morphosyntactic Components Modelling for Resolving Pronominal Anaphora in Gujarati Text
Researcher: Tailor Chetanaben Maheshbhai
Guide(s): Patel Bankim
Keywords: anaphora resolution - Gujarati language
Computer Science
Natural Language Processing
University: Uka Tarsadia University
Completed Date: 2022
Abstract: The present epoch has witnessed much research and enhancement in the field of NLP. In this digitized era, the major text is in the english language, but only 10% of people in India understand the English language. Many people living in rural communities neither understand nor speak the English language. Therefore, to realize the dream of Digital India, local languages are given more focus now a day which is a dearth need for research to remove the language barrier. newlineThe research work has been started in natural languages to develop different NLP applications such as Machine Translation, Text Summarization, Question Answering systems etc. A significant amount of work has been recorded for foreign languages, but no significant work has been recorded for Pronominal Anaphora Resolution for Gujarati text, even though it contributes in developing NLP applications.So, the objective of this research work is to study the anaphora and find its suitable newlineantecedents automatically in Gujarati text discourse. It requires the pre-processing components such as Sentence tokenizer, POS Tagger, Chunker, and Morphological newlineAnalyzer. newlineA sentence tokenizer, which is not only useful in Anaphora Resolution but also valuable for Text summarization, POS tagger, and Chunker development too, is the newlineinitial step in text processing. Dot, exclamation marks, single quotes, double quotes,question marks, and consecutive multiple occurrences of sentence end markers are considered as sentence end markers. A statistical model, namely Punkt, has been developed using the Gujarati news article corpus. Linguistic rules are designed to newlinehandle the issues such as the abbreviation, the parenthetical expressions, the order list covering different patterns, and quotation marks ambiguity due to different types newlineof quotation marks as well as direct speech sentences. An average accuracy achieved is 99.34% using corpus consisting of the six different article categories, namely newlineBusiness, Crime, Politics, Sports, Technical, and Vaividhya including EMILLE corpus.
Pagination: xxii,182p
URI: http://hdl.handle.net/10603/373529
Appears in Departments:Faculty of Computer Science

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File497.01 kBAdobe PDFView/Open
02_declaration.pdf383.25 kBAdobe PDFView/Open
03_certificates.pdf2.07 MBAdobe PDFView/Open
04_acknowledgement.pdf869.61 kBAdobe PDFView/Open
05_content.pdf1.7 MBAdobe PDFView/Open
06_preface.pdf604.25 kBAdobe PDFView/Open
07_chapter 1.pdf1.58 MBAdobe PDFView/Open
08_chapter_2.pdf1.42 MBAdobe PDFView/Open
09_chapter_3.pdf1.19 MBAdobe PDFView/Open
10_chapter_4.pdf1.52 MBAdobe PDFView/Open
11_chapter_5.pdf1.64 MBAdobe PDFView/Open
12_chapter_6.pdf1.45 MBAdobe PDFView/Open
13_chapter_7.pdf1.45 MBAdobe PDFView/Open
14_chapter_8.pdf1.12 MBAdobe PDFView/Open
15_chapter_9.pdf669.35 kBAdobe PDFView/Open
16_chapter_10.pdf855.22 kBAdobe PDFView/Open
17_references.pdf940.28 kBAdobe PDFView/Open
18_plagiarism_report.pdf521.24 kBAdobe PDFView/Open
80_recommendation.pdf1.75 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: