Development of Natural Language Processing Tools and Resources for Assamese Text

Pathak, Dhrubajyoti

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/574201

Title:	Development of Natural Language Processing Tools and Resources for Assamese Text
Researcher:	Pathak, Dhrubajyoti
Guide(s):	Nandi, Sukumar and Sarmah, Priyankoo
Keywords:	Arts and Humanities Language Linguistics
University:	Indian Institute of Technology Guwahati
Completed Date:	2024
Abstract:	quotNatural Language Processing (NLP) is a discipline of computer science concerned with the interaction of computers and humans in natural language. The field of NLP has expanded significantly over the last decade and now plays an important role in numerous sectors, such as IT industry, Education, Health care, Banking, Stock market, newlineEntertainment, etc. Over the years, researchers in NLP have developed a wide range of tools and resources to support various NLP tasks such as text classification, sentiment analysis, machine translation, language newlineunderstanding, question answering, and speech recognition. These models require a large amount of dataset and newlinecomputing power. The models achieve state-of-the-art performance in many NLP areas for high-resource languages. newlineOn the other hand, it is observed that these models do not cover low-resource language such as Assamese. It newlinehappens due to limited language resources, which leads to not receiving as much attention, unlike resource-rich newlinelanguages in NLP research. Although there are previous studies on various types of tools and resources for various newlinetasks. It is also observed that existing resources, such as raw text corpus and annotated dataset, are not sufficient in newlinesize for deep learning models. Hence, it is necessary to enhance and increase the data size of the resources as well. newlineThis dissertation presents four contributions to language tools and resources specifically focused on low-resource newlinelanguages, emphasizing Assamese language.The first contribution in this direction focuses on identifying and newlinemodeling reduplication in Assamese text. Reduplication is a productive morphological process widely used in the newlineAssamese language. Addressing reduplication plays a vital role in the e iciency of POS tagger, fficiency of POS tagger, sentiment analysis, sentiment analysis, newlineas well as other downstream NLP tasks. A Deep learning (DL)-based Assamese word embedding model is proposed in the second contribution. Word embedding model is a crucial component in DL-based downstream seque
Pagination:
URI:	http://hdl.handle.net/10603/574201
Appears in Departments:	CENTRE FOR LINGUISTICS SCIENCE AND TECHNOLOGY

Files in This Item:

File	Description	Size	Format
01_fulltext.pdf	Attached File	1.64 MB	Adobe PDF	View/Open
04_abstract.pdf		63.57 kB	Adobe PDF	View/Open
80_recommendation.pdf		288.16 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET