Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/574201
Title: | Development of Natural Language Processing Tools and Resources for Assamese Text |
Researcher: | Pathak, Dhrubajyoti |
Guide(s): | Nandi, Sukumar and Sarmah, Priyankoo |
Keywords: | Arts and Humanities Language Linguistics |
University: | Indian Institute of Technology Guwahati |
Completed Date: | 2024 |
Abstract: | quotNatural Language Processing (NLP) is a discipline of computer science concerned with the interaction of computers and humans in natural language. The field of NLP has expanded significantly over the last decade and now plays an important role in numerous sectors, such as IT industry, Education, Health care, Banking, Stock market, newlineEntertainment, etc. Over the years, researchers in NLP have developed a wide range of tools and resources to support various NLP tasks such as text classification, sentiment analysis, machine translation, language newlineunderstanding, question answering, and speech recognition. These models require a large amount of dataset and newlinecomputing power. The models achieve state-of-the-art performance in many NLP areas for high-resource languages. newlineOn the other hand, it is observed that these models do not cover low-resource language such as Assamese. It newlinehappens due to limited language resources, which leads to not receiving as much attention, unlike resource-rich newlinelanguages in NLP research. Although there are previous studies on various types of tools and resources for various newlinetasks. It is also observed that existing resources, such as raw text corpus and annotated dataset, are not sufficient in newlinesize for deep learning models. Hence, it is necessary to enhance and increase the data size of the resources as well. newlineThis dissertation presents four contributions to language tools and resources specifically focused on low-resource newlinelanguages, emphasizing Assamese language.The first contribution in this direction focuses on identifying and newlinemodeling reduplication in Assamese text. Reduplication is a productive morphological process widely used in the newlineAssamese language. Addressing reduplication plays a vital role in the e iciency of POS tagger, fficiency of POS tagger, sentiment analysis, sentiment analysis, newlineas well as other downstream NLP tasks. A Deep learning (DL)-based Assamese word embedding model is proposed in the second contribution. Word embedding model is a crucial component in DL-based downstream seque |
Pagination: | |
URI: | http://hdl.handle.net/10603/574201 |
Appears in Departments: | CENTRE FOR LINGUISTICS SCIENCE AND TECHNOLOGY |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_fulltext.pdf | Attached File | 1.64 MB | Adobe PDF | View/Open |
04_abstract.pdf | 63.57 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 288.16 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: