Extraction of Multiword Expressions from Hindi Text Document

Mishra, Atul

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/411302

Title:	Extraction of Multiword Expressions from Hindi Text Document
Researcher:	Mishra, Atul
Guide(s):	Shaikh, Soharab Hossain and Sanyal, Ratna
Keywords:	Computer Science Computer Science Artificial Intelligence Engineering and Technology
University:	BML Munjal University, Gurugram
Completed Date:	2022
Abstract:	Multiword expressions (MWEs) are a significant challenge in many fields of newlinelanguage technology. Multiword extraction from random text data has grown in newlinepopularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated newlineMultiword extraction system. The overall contribution of the thesis has been newlinedivided into six parts. newlineIn this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy newline(i.e., the structure of linguistic patterns and association) and context connection newlinebetween their constituent words. Various combination strategies of different newlineclassifiers based on these properties may be applied to develop a multi word extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. newlineThere are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. newlineAnother issue is to combine multiple filters since different combination strategies newlinemay be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. newlineThe methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available.
Pagination:	XVII, 109
URI:	http://hdl.handle.net/10603/411302
Appears in Departments:	School of Engineering and Technology

Files in This Item:

File	Description	Size	Format
01_title.pdf.pdf	Attached File	202.92 kB	Adobe PDF	View/Open
02_declaration.pdf.pdf		67.34 kB	Adobe PDF	View/Open
03_certificate.pdf.pdf		552.88 kB	Adobe PDF	View/Open
04_acknowledgement.pdf.pdf		585.79 kB	Adobe PDF	View/Open
05_contents.pdf.pdf		726.43 kB	Adobe PDF	View/Open
06_list of graph and table.pdf		1.21 MB	Adobe PDF	View/Open
07_abstract.pdf.pdf		549.84 kB	Adobe PDF	View/Open
08_chapter1.pdf.pdf		4.42 MB	Adobe PDF	View/Open
09_chapter2.pdf.pdf		5.74 MB	Adobe PDF	View/Open
10_chapter3.pdf.pdf		4.26 MB	Adobe PDF	View/Open
11_chapter4.pdf.pdf		2.41 MB	Adobe PDF	View/Open
12_chapter5.pdf.pdf		3.99 MB	Adobe PDF	View/Open
13_chapter6.pdf.pdf		3.7 MB	Adobe PDF	View/Open
14_summary.pdf.pdf		185.09 kB	Adobe PDF	View/Open
15_bibliography.pdf.pdf		2.84 MB	Adobe PDF	View/Open
80_recommendation.pdf		551.8 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET