Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/411302
Title: Extraction of Multiword Expressions from Hindi Text Document
Researcher: Mishra, Atul
Guide(s): Shaikh, Soharab Hossain and Sanyal, Ratna
Keywords: Computer Science
Computer Science Artificial Intelligence
Engineering and Technology
University: BML Munjal University, Gurugram
Completed Date: 2022
Abstract: Multiword expressions (MWEs) are a significant challenge in many fields of newlinelanguage technology. Multiword extraction from random text data has grown in newlinepopularity among the NLP community. This topic of research is strongly connected to statistical analysis and artificial intelligence. This thesis presents a detailed literature assessment and numerous strategies for building an automated newlineMultiword extraction system. The overall contribution of the thesis has been newlinedivided into six parts. newlineIn this study, a method of Hindi MWEs has been proposed, and the significance of boundary threshold calculations in this study. The main objective of this dissertation work is to develop a generalized mechanism for the extraction of Hindi multiword expressions. The primary goal of this research is to build an approach for extracting Hindi MWEs using syntactical and statistical idiosyncrasy newline(i.e., the structure of linguistic patterns and association) and context connection newlinebetween their constituent words. Various combination strategies of different newlineclassifiers based on these properties may be applied to develop a multi word extraction mechanism. Hence, creating a best-performing combination strategy is also an objective of this dissertation. newlineThere are various hurdles in designing a method using these properties. In statistical filtering, calculating the boundary threshold is a challenging task. newlineAnother issue is to combine multiple filters since different combination strategies newlinemay be possible. Thus, recognizing the best combination strategy is also a challenge. In the Hybrid method, Semantic Similarity has been used. The study developed a web application using the Flask framework to automatically extract the Hindi MWEs using the Association based and Hybrid methods. newlineThe methods, evaluation results, and findings in each contribution have been presented in different chapters. The proposed technique is evaluated using the HDTB Treebank and TDIL dataset, which is freely available.
Pagination: XVII, 109
URI: http://hdl.handle.net/10603/411302
Appears in Departments:School of Engineering and Technology

Files in This Item:
File Description SizeFormat 
01_title.pdf.pdfAttached File202.92 kBAdobe PDFView/Open
02_declaration.pdf.pdf67.34 kBAdobe PDFView/Open
03_certificate.pdf.pdf552.88 kBAdobe PDFView/Open
04_acknowledgement.pdf.pdf585.79 kBAdobe PDFView/Open
05_contents.pdf.pdf726.43 kBAdobe PDFView/Open
06_list of graph and table.pdf1.21 MBAdobe PDFView/Open
07_abstract.pdf.pdf549.84 kBAdobe PDFView/Open
08_chapter1.pdf.pdf4.42 MBAdobe PDFView/Open
09_chapter2.pdf.pdf5.74 MBAdobe PDFView/Open
10_chapter3.pdf.pdf4.26 MBAdobe PDFView/Open
11_chapter4.pdf.pdf2.41 MBAdobe PDFView/Open
12_chapter5.pdf.pdf3.99 MBAdobe PDFView/Open
13_chapter6.pdf.pdf3.7 MBAdobe PDFView/Open
14_summary.pdf.pdf185.09 kBAdobe PDFView/Open
15_bibliography.pdf.pdf2.84 MBAdobe PDFView/Open
80_recommendation.pdf551.8 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: