Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/331729
Title: Identification of named entities from different language families
Researcher: Malarkodi C S
Guide(s): Sobha L
Keywords: Arts and Humanities
Language
Language and Linguisticsn
language families
Identification
University: Anna University
Completed Date: 2020
Abstract: The objective of the study is to develop the Generic Named Entity system which can automatically extract the generic features for identification of named entities from the given dataset. The features used in this work do not depend on any language or domain. The 12 languages studied in this work belong to different language families. The languages belonging to Dravidian language family are Tamil, Telugu and Malayalam, languages coming under Indo-Aryan family are Hindi, Marathi, Punjabi and Bengali, languages belonging to Germanic family are English, Dutch and German, language form Romance language family is Spanish and language coming under Uralic family is Hungarian are considered for anlaysis in this work. Named Entity Recognition (NER) is defined as the process of automatic identification of proper nouns and classifies the identified entities into the predefined categories such as person, location, organization, facilities, products, temporal or numeric expressions etc. The machine learning technique Conditional Random Fields (CRFs) is used to identify the named entities in the present work. The linguistic analysis has done to observe the part of speech patterns immediately preceding or following named entities and found the common patterns of named entities for all the 12 languages. The lexical level features such as word, first word, trigrams and bigrams of suffix and prefix information of the current token, syntactic level features such as POS and chunk information, dynamic features such as POS patterns preceding and following NE, preceding word and POS information of NE, following word and POS information of NE are used for the system development. The dynamic features obtained using the generic feature selection methodology is validated using the K-means++ clustering algorithm. newline
Pagination: xxv, 140p.
URI: http://hdl.handle.net/10603/331729
Appears in Departments:Faculty of Science and Humanities

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File29.13 kBAdobe PDFView/Open
02_certificates.pdf224.28 kBAdobe PDFView/Open
03_vivaproceedings.pdf417.21 kBAdobe PDFView/Open
04_bonafidecertificate.pdf319.9 kBAdobe PDFView/Open
05_abstracts.pdf8.58 kBAdobe PDFView/Open
06_acknowledgements.pdf6.82 kBAdobe PDFView/Open
07_contents.pdf300.58 kBAdobe PDFView/Open
08_listoftables.pdf27.93 kBAdobe PDFView/Open
09_listoffigures.pdf28.15 kBAdobe PDFView/Open
10_listofabbreviations.pdf16.76 kBAdobe PDFView/Open
11_chapter1.pdf235.14 kBAdobe PDFView/Open
12_chapter2.pdf1.93 MBAdobe PDFView/Open
13_chapter3.pdf571.61 kBAdobe PDFView/Open
14_chapter4.pdf698.56 kBAdobe PDFView/Open
15_conclusion.pdf41.9 kBAdobe PDFView/Open
16_appendices.pdf190.77 kBAdobe PDFView/Open
17_references.pdf178.15 kBAdobe PDFView/Open
18_listofpublications.pdf130.97 kBAdobe PDFView/Open
80_recommendation.pdf64.94 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: