Exploiting Linguistic Knowledge to Address Representation and Sparsity Issues in Dependency Parsing of Indian Languages

Riyaz Ahmad Bhat

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/139773

Full metadata record

DC Field	Value	Language
dc.coverage.spatial
dc.date.accessioned	2017-03-09T08:20:30Z	-
dc.date.available	2017-03-09T08:20:30Z	-
dc.identifier.uri	http://hdl.handle.net/10603/139773	-
dc.description.abstract	Over the years, it has been observed that morphologically rich and free word order languages, unlike fixed word order languages, are harder to parse, regardless of the parsing technique used. On the one hand, rich morphology provides explicit cues for parsing, while on the other hand it worsens the problem of data sparsity as it leads to high lexical diversity and variation in word order. In this thesis, we aim to address this trade off for accurate and robust parsing of morphologically rich Indian languages. We present novel strategies to effectively represent morphology in the parsing models and also to mitigate the effect of its trade-offs. We propose to represent morphosyntactic information as higher-order features under the Markovian assumption. More specifically, we use the history of a transition-based parser to extract and propagate morphological information such as case and grammatical agreement as higher-order features for parsing nominal nodes. Despite its benefits, rich morphology can also pose a multitude of challenges to statistical parsing. The most prominent issue is related to sampling bias towards canonical structures of a language. As current parsers are mostly trained on formal texts, even a slight deviation from canonical word order can severely affect their performance. To overcome this bias, we propose a sampling technique to generate training instances with diverse word orders from the available canonical structures. We show that linearly interpolated models trained on diverse views of the same data can effectively parse both canonical and non-canonical texts. Similarly, to mitigate the effect of lexical sparsity, we use supervised domain adaptation techniques for training parsers on lexically more diverse annotations from augmented Hindi and Urdu treebanks. Furthermore, we explore lexical semantics as a viable alternative to more training data for parsing semantically rich but sparse dependency annotations in Indian language treebanks.
dc.format.extent	xiii, 145
dc.language	English
dc.relation
dc.rights	self
dc.title	Exploiting Linguistic Knowledge to Address Representation and Sparsity Issues in Dependency Parsing of Indian Languages
dc.title.alternative
dc.creator.researcher	Riyaz Ahmad Bhat
dc.subject.keyword	Arc Eager Algorithm
dc.subject.keyword	Dynamic Oracle
dc.subject.keyword	Multilayered Perceptron
dc.subject.keyword	Multinomial Naive Bayes
dc.subject.keyword	Noisy Channel Model
dc.subject.keyword	Non-projectivity
dc.subject.keyword	Syntactic Parsing
dc.subject.keyword	Syntactic Treebanks
dc.subject.keyword	Transition Systems
dc.subject.keyword	Vanilla Perceptron
dc.description.note
dc.contributor.guide	Prof. Dipti Misra Sharma
dc.publisher.place	Hyderabad
dc.publisher.university	International Institute of Information Technology, Hyderabad
dc.publisher.institution	Computational Linguistics
dc.date.registered	1-8-2009
dc.date.completed	07/01/2017
dc.date.awarded	31/07/2017
dc.format.dimensions
dc.format.accompanyingmaterial	None
dc.source.university	University
dc.type.degree	Ph.D.
Appears in Departments:	Computational Linguistics

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	43.49 kB	Adobe PDF	View/Open
02_copyright.pdf		17.75 kB	Adobe PDF	View/Open
03_certificate.pdf		22.07 kB	Adobe PDF	View/Open
04_acknowledgements.pdf		26.82 kB	Adobe PDF	View/Open
05_abstract.pdf		24.55 kB	Adobe PDF	View/Open
06_contents.pdf		43.55 kB	Adobe PDF	View/Open
07_list of table and figures.pdf		47.79 kB	Adobe PDF	View/Open
08_chapter 1.pdf		94.84 kB	Adobe PDF	View/Open
09_chapter 2.pdf		120.46 kB	Adobe PDF	View/Open
10_chapter 3.pdf		163.74 kB	Adobe PDF	View/Open
11_chapter 4.pdf		376.43 kB	Adobe PDF	View/Open
12_chapter 5.pdf		179.79 kB	Adobe PDF	View/Open
13_chapter 6.pdf		1.11 MB	Adobe PDF	View/Open
14_chapter 7.pdf		462.69 kB	Adobe PDF	View/Open
15_chapter 8.pdf		281.6 kB	Adobe PDF	View/Open
16_chapter 9.pdf		35.06 kB	Adobe PDF	View/Open
17_references.pdf		100.88 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET