Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/331729
Title: | Identification of named entities from different language families |
Researcher: | Malarkodi C S |
Guide(s): | Sobha L |
Keywords: | Arts and Humanities Language Language and Linguisticsn language families Identification |
University: | Anna University |
Completed Date: | 2020 |
Abstract: | The objective of the study is to develop the Generic Named Entity system which can automatically extract the generic features for identification of named entities from the given dataset. The features used in this work do not depend on any language or domain. The 12 languages studied in this work belong to different language families. The languages belonging to Dravidian language family are Tamil, Telugu and Malayalam, languages coming under Indo-Aryan family are Hindi, Marathi, Punjabi and Bengali, languages belonging to Germanic family are English, Dutch and German, language form Romance language family is Spanish and language coming under Uralic family is Hungarian are considered for anlaysis in this work. Named Entity Recognition (NER) is defined as the process of automatic identification of proper nouns and classifies the identified entities into the predefined categories such as person, location, organization, facilities, products, temporal or numeric expressions etc. The machine learning technique Conditional Random Fields (CRFs) is used to identify the named entities in the present work. The linguistic analysis has done to observe the part of speech patterns immediately preceding or following named entities and found the common patterns of named entities for all the 12 languages. The lexical level features such as word, first word, trigrams and bigrams of suffix and prefix information of the current token, syntactic level features such as POS and chunk information, dynamic features such as POS patterns preceding and following NE, preceding word and POS information of NE, following word and POS information of NE are used for the system development. The dynamic features obtained using the generic feature selection methodology is validated using the K-means++ clustering algorithm. newline |
Pagination: | xxv, 140p. |
URI: | http://hdl.handle.net/10603/331729 |
Appears in Departments: | Faculty of Science and Humanities |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 29.13 kB | Adobe PDF | View/Open |
02_certificates.pdf | 224.28 kB | Adobe PDF | View/Open | |
03_vivaproceedings.pdf | 417.21 kB | Adobe PDF | View/Open | |
04_bonafidecertificate.pdf | 319.9 kB | Adobe PDF | View/Open | |
05_abstracts.pdf | 8.58 kB | Adobe PDF | View/Open | |
06_acknowledgements.pdf | 6.82 kB | Adobe PDF | View/Open | |
07_contents.pdf | 300.58 kB | Adobe PDF | View/Open | |
08_listoftables.pdf | 27.93 kB | Adobe PDF | View/Open | |
09_listoffigures.pdf | 28.15 kB | Adobe PDF | View/Open | |
10_listofabbreviations.pdf | 16.76 kB | Adobe PDF | View/Open | |
11_chapter1.pdf | 235.14 kB | Adobe PDF | View/Open | |
12_chapter2.pdf | 1.93 MB | Adobe PDF | View/Open | |
13_chapter3.pdf | 571.61 kB | Adobe PDF | View/Open | |
14_chapter4.pdf | 698.56 kB | Adobe PDF | View/Open | |
15_conclusion.pdf | 41.9 kB | Adobe PDF | View/Open | |
16_appendices.pdf | 190.77 kB | Adobe PDF | View/Open | |
17_references.pdf | 178.15 kB | Adobe PDF | View/Open | |
18_listofpublications.pdf | 130.97 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 64.94 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: