Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/330368
Title: | Named Entity Recognition in Gujarati Language using Rule based Approach |
Researcher: | Shah Dikshan N |
Guide(s): | Bhadka Harshad B. |
Keywords: | Computer Science Computer Science Information Systems Engineering and Technology |
University: | C.U. Shah University |
Completed Date: | 2021 |
Abstract: | VIII newlineAbstract newlineTitle: Named Entity Recognition in Gujarati Language using Rule-based Approach newlineSubmitted By: Dikshan N Shah, Assistant Professor, S S Agrawal Institute of Computer Science, Navsari. Ph.D. Scholar, Faculty of Computer Science, C. U. Shah University, Wadhwan. newlineSupervised By: Dr. Harshad B. Bhadka, Dean of Faculty of Computer Science, C. U. Shah University, Wadhwan, Gujarat, India newlineBackground: Natural Language Processing (NLP) is a very interesting method of human-computer communication which is sometimes described as an AI-complete problem. The ample data is useful only if suitable techniques are available to process the data and obtain knowledge from it. This termed extracting information is called Information Extraction (IE) which performs a major role in NLP in converting unstructured textual data into structured data which can be clearly understood by machines and the process is called Named Entity Recognition (NER). Major NER work has been done in Non-Indian languages like English, Chinese, German, French, etc. Though Indian languages are resource-poor, not enough work has been done in it. The Gujarati language is one of them. newlineAim: Named entity recognition work has been done in a few Indian languages, like Hindi, Marathi, Bengali, Urdu, Tamil, etc. The main goal of this research is to design a Hybrid algorithm as a combination of Rule-based and Gazetteer based approach to identify various named entities from unstructured text data written in the Gujarati language. newlineMethods: In this research, Various Gazetteer list has been developed manually and some handcrafted rules have been designed to identify various Named entities from an unstructured text document. For this research purpose, seven different categorical documents such as Articles, Entertainment, News, Poems, Religious, newlineIX newlineSports and Stories have been collected as a corpus. More than 1500 documents were collected for all categories. newlineResult and Discussion: Classic method and Frequency-based Zipf s law algorithm used to identify and remove |
Pagination: | 195 p. |
URI: | http://hdl.handle.net/10603/330368 |
Appears in Departments: | Department of Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
03-title page.pdf | Attached File | 295.63 kB | Adobe PDF | View/Open |
04- certificate.pdf | 392.22 kB | Adobe PDF | View/Open | |
05- preliminary pages.pdf | 527.63 kB | Adobe PDF | View/Open | |
06- chapter - 1 introduction.pdf | 1.71 MB | Adobe PDF | View/Open | |
07- chapter - 2 existing approach.pdf | 1.25 MB | Adobe PDF | View/Open | |
08- chapter - 3 literature survey.pdf | 981.77 kB | Adobe PDF | View/Open | |
09- chapter - 4 methodology.pdf | 2.05 MB | Adobe PDF | View/Open | |
10- chapter - 5 results and discussion.pdf | 1.4 MB | Adobe PDF | View/Open | |
11- chapter - 6 conclusion.pdf | 468.87 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 527.63 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: