Exploration of Semantic Space of Word Vectors Using Word Embedding

Sanjanasri J P

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/360622

Full metadata record

DC Field	Value	Language
dc.coverage.spatial
dc.date.accessioned	2022-02-08T07:13:22Z	-
dc.date.available	2022-02-08T07:13:22Z	-
dc.identifier.uri	http://hdl.handle.net/10603/360622	-
dc.description.abstract	The prime objective of the investigation presented in this thesis was to explore the semantic space in word vectors using neural word embedding. Thenon-existence of a clean, sentence aligned parallel corpus for English-Tamil language pair calls for a sufficiently large bilingual corpus for the implementation of various Natural Language Processing (NLP) applications such as machine translation, cross-lingual information retrieval and semantic comparison. Although word embedding has been in vogue in recent years, the adequate method for the evaluation of word embedding begs attention. Besides an in-depth discussion of the intrinsic and extrinsic evaluation of bilingual word embedding models, a data set was developed for the evaluation of English -Tamil bilingual word embedding algorithms. The data set was evaluated on a bilingual model; analysis of experimental results showcased insightful inferences into the semantics captured by word vectors and human cognition. However, bilingual embeddings typically capture common semantics and reject variations. Hence, transfer function-based generated embedding (TFGE), a deeply learned transfer function was developed, where vectors from the embedding space of one language are projected onto that of the other language.Three well regarded off-the-shelf embedding algorithms, Word2Vec, GloVe,and FastText, were used to train the TFGE model, from English, a resource rich source language, to Tamil, a resource-deficient target language, in a data efficient way. The efficacy of the proposed TFGE model was confirmed by a better synthesis of new vectors for unknown source language words. Pre -trained Word2Vec Hindi and Chinese embeddings were marshalled to appraise the deployable capability of the TFGE model across other target languages. The versatility of the developed model was substantively demonstrated in selected NLP use-cases - Text Summarization, Part Of Speech (POS) Tagging,and Bilingual Dictionary Induction (BDI).In a nutshell,the following developments are the major ...
dc.format.extent	xxi, 162
dc.language	English
dc.relation
dc.rights	university
dc.title	Exploration of Semantic Space of Word Vectors Using Word Embedding
dc.title.alternative
dc.creator.researcher	Sanjanasri J P
dc.subject.keyword	Center for Computational Engineering and Networking; Natural Language Processing; NLP; Neural Networks;semantic space ; bilingual word; Word Embedding; Deep Learning; machine learning; Pruning; Indian languages
dc.subject.keyword	Computer Science; Interdisciplinary Applications;
dc.description.note
dc.contributor.guide	Soman K P
dc.publisher.place	Coimbatore
dc.publisher.university	Amrita Vishwa Vidyapeetham University
dc.publisher.institution	Center for Computational Engineering and Networking (CEN)
dc.date.registered	2014
dc.date.completed	2021
dc.date.awarded	2021
dc.format.dimensions
dc.format.accompanyingmaterial	None
dc.source.university	University
dc.type.degree	Ph.D.
Appears in Departments:	Center for Computational Engineering and Networking (CEN)

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	145.17 kB	Adobe PDF	View/Open
02_certificate.pdf		194.19 kB	Adobe PDF	View/Open
03_ preliminary pages.pdf		421.73 kB	Adobe PDF	View/Open
04_chapter 1.pdf		157.8 kB	Adobe PDF	View/Open
05_chapter 2.pdf		440.2 kB	Adobe PDF	View/Open
06_chapter 3.pdf		381.74 kB	Adobe PDF	View/Open
07_chapter 4.pdf		423.78 kB	Adobe PDF	View/Open
08_chapter 5.pdf		1.13 MB	Adobe PDF	View/Open
09_chapter 6.pdf		114.09 kB	Adobe PDF	View/Open
10_bibliography.pdf		156.11 kB	Adobe PDF	View/Open
11_publications.pdf		74.95 kB	Adobe PDF	View/Open
80_recommendation.pdf		258.83 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET