Named Entity Recognition in Gujarati Language using Rule based Approach

Shah Dikshan N

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/330368

Title:	Named Entity Recognition in Gujarati Language using Rule based Approach
Researcher:	Shah Dikshan N
Guide(s):	Bhadka Harshad B.
Keywords:	Computer Science Computer Science Information Systems Engineering and Technology
University:	C.U. Shah University
Completed Date:	2021
Abstract:	VIII newlineAbstract newlineTitle: Named Entity Recognition in Gujarati Language using Rule-based Approach newlineSubmitted By: Dikshan N Shah, Assistant Professor, S S Agrawal Institute of Computer Science, Navsari. Ph.D. Scholar, Faculty of Computer Science, C. U. Shah University, Wadhwan. newlineSupervised By: Dr. Harshad B. Bhadka, Dean of Faculty of Computer Science, C. U. Shah University, Wadhwan, Gujarat, India newlineBackground: Natural Language Processing (NLP) is a very interesting method of human-computer communication which is sometimes described as an AI-complete problem. The ample data is useful only if suitable techniques are available to process the data and obtain knowledge from it. This termed extracting information is called Information Extraction (IE) which performs a major role in NLP in converting unstructured textual data into structured data which can be clearly understood by machines and the process is called Named Entity Recognition (NER). Major NER work has been done in Non-Indian languages like English, Chinese, German, French, etc. Though Indian languages are resource-poor, not enough work has been done in it. The Gujarati language is one of them. newlineAim: Named entity recognition work has been done in a few Indian languages, like Hindi, Marathi, Bengali, Urdu, Tamil, etc. The main goal of this research is to design a Hybrid algorithm as a combination of Rule-based and Gazetteer based approach to identify various named entities from unstructured text data written in the Gujarati language. newlineMethods: In this research, Various Gazetteer list has been developed manually and some handcrafted rules have been designed to identify various Named entities from an unstructured text document. For this research purpose, seven different categorical documents such as Articles, Entertainment, News, Poems, Religious, newlineIX newlineSports and Stories have been collected as a corpus. More than 1500 documents were collected for all categories. newlineResult and Discussion: Classic method and Frequency-based Zipf s law algorithm used to identify and remove
Pagination:	195 p.
URI:	http://hdl.handle.net/10603/330368
Appears in Departments:	Department of Computer Science

Files in This Item:

File	Description	Size	Format
03-title page.pdf	Attached File	295.63 kB	Adobe PDF	View/Open
04- certificate.pdf		392.22 kB	Adobe PDF	View/Open
05- preliminary pages.pdf		527.63 kB	Adobe PDF	View/Open
06- chapter - 1 introduction.pdf		1.71 MB	Adobe PDF	View/Open
07- chapter - 2 existing approach.pdf		1.25 MB	Adobe PDF	View/Open
08- chapter - 3 literature survey.pdf		981.77 kB	Adobe PDF	View/Open
09- chapter - 4 methodology.pdf		2.05 MB	Adobe PDF	View/Open
10- chapter - 5 results and discussion.pdf		1.4 MB	Adobe PDF	View/Open
11- chapter - 6 conclusion.pdf		468.87 kB	Adobe PDF	View/Open
80_recommendation.pdf		527.63 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET