Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/330368
Title: Named Entity Recognition in Gujarati Language using Rule based Approach
Researcher: Shah Dikshan N
Guide(s): Bhadka Harshad B.
Keywords: Computer Science
Computer Science Information Systems
Engineering and Technology
University: C.U. Shah University
Completed Date: 2021
Abstract: VIII newlineAbstract newlineTitle: Named Entity Recognition in Gujarati Language using Rule-based Approach newlineSubmitted By: Dikshan N Shah, Assistant Professor, S S Agrawal Institute of Computer Science, Navsari. Ph.D. Scholar, Faculty of Computer Science, C. U. Shah University, Wadhwan. newlineSupervised By: Dr. Harshad B. Bhadka, Dean of Faculty of Computer Science, C. U. Shah University, Wadhwan, Gujarat, India newlineBackground: Natural Language Processing (NLP) is a very interesting method of human-computer communication which is sometimes described as an AI-complete problem. The ample data is useful only if suitable techniques are available to process the data and obtain knowledge from it. This termed extracting information is called Information Extraction (IE) which performs a major role in NLP in converting unstructured textual data into structured data which can be clearly understood by machines and the process is called Named Entity Recognition (NER). Major NER work has been done in Non-Indian languages like English, Chinese, German, French, etc. Though Indian languages are resource-poor, not enough work has been done in it. The Gujarati language is one of them. newlineAim: Named entity recognition work has been done in a few Indian languages, like Hindi, Marathi, Bengali, Urdu, Tamil, etc. The main goal of this research is to design a Hybrid algorithm as a combination of Rule-based and Gazetteer based approach to identify various named entities from unstructured text data written in the Gujarati language. newlineMethods: In this research, Various Gazetteer list has been developed manually and some handcrafted rules have been designed to identify various Named entities from an unstructured text document. For this research purpose, seven different categorical documents such as Articles, Entertainment, News, Poems, Religious, newlineIX newlineSports and Stories have been collected as a corpus. More than 1500 documents were collected for all categories. newlineResult and Discussion: Classic method and Frequency-based Zipf s law algorithm used to identify and remove
Pagination: 195 p.
URI: http://hdl.handle.net/10603/330368
Appears in Departments:Department of Computer Science

Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: