Identification of named entities from different language families

Malarkodi C S

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/331729

Title:	Identification of named entities from different language families
Researcher:	Malarkodi C S
Guide(s):	Sobha L
Keywords:	Arts and Humanities Language Language and Linguisticsn language families Identification
University:	Anna University
Completed Date:	2020
Abstract:	The objective of the study is to develop the Generic Named Entity system which can automatically extract the generic features for identification of named entities from the given dataset. The features used in this work do not depend on any language or domain. The 12 languages studied in this work belong to different language families. The languages belonging to Dravidian language family are Tamil, Telugu and Malayalam, languages coming under Indo-Aryan family are Hindi, Marathi, Punjabi and Bengali, languages belonging to Germanic family are English, Dutch and German, language form Romance language family is Spanish and language coming under Uralic family is Hungarian are considered for anlaysis in this work. Named Entity Recognition (NER) is defined as the process of automatic identification of proper nouns and classifies the identified entities into the predefined categories such as person, location, organization, facilities, products, temporal or numeric expressions etc. The machine learning technique Conditional Random Fields (CRFs) is used to identify the named entities in the present work. The linguistic analysis has done to observe the part of speech patterns immediately preceding or following named entities and found the common patterns of named entities for all the 12 languages. The lexical level features such as word, first word, trigrams and bigrams of suffix and prefix information of the current token, syntactic level features such as POS and chunk information, dynamic features such as POS patterns preceding and following NE, preceding word and POS information of NE, following word and POS information of NE are used for the system development. The dynamic features obtained using the generic feature selection methodology is validated using the K-means++ clustering algorithm. newline
Pagination:	xxv, 140p.
URI:	http://hdl.handle.net/10603/331729
Appears in Departments:	Faculty of Science and Humanities

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	29.13 kB	Adobe PDF	View/Open
02_certificates.pdf		224.28 kB	Adobe PDF	View/Open
03_vivaproceedings.pdf		417.21 kB	Adobe PDF	View/Open
04_bonafidecertificate.pdf		319.9 kB	Adobe PDF	View/Open
05_abstracts.pdf		8.58 kB	Adobe PDF	View/Open
06_acknowledgements.pdf		6.82 kB	Adobe PDF	View/Open
07_contents.pdf		300.58 kB	Adobe PDF	View/Open
08_listoftables.pdf		27.93 kB	Adobe PDF	View/Open
09_listoffigures.pdf		28.15 kB	Adobe PDF	View/Open
10_listofabbreviations.pdf		16.76 kB	Adobe PDF	View/Open
11_chapter1.pdf		235.14 kB	Adobe PDF	View/Open
12_chapter2.pdf		1.93 MB	Adobe PDF	View/Open
13_chapter3.pdf		571.61 kB	Adobe PDF	View/Open
14_chapter4.pdf		698.56 kB	Adobe PDF	View/Open
15_conclusion.pdf		41.9 kB	Adobe PDF	View/Open
16_appendices.pdf		190.77 kB	Adobe PDF	View/Open
17_references.pdf		178.15 kB	Adobe PDF	View/Open
18_listofpublications.pdf		130.97 kB	Adobe PDF	View/Open
80_recommendation.pdf		64.94 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET