Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/139773
Full metadata record
DC FieldValueLanguage
dc.coverage.spatial
dc.date.accessioned2017-03-09T08:20:30Z-
dc.date.available2017-03-09T08:20:30Z-
dc.identifier.urihttp://hdl.handle.net/10603/139773-
dc.description.abstractOver the years, it has been observed that morphologically rich and free word order languages, unlike fixed word order languages, are harder to parse, regardless of the parsing technique used. On the one hand, rich morphology provides explicit cues for parsing, while on the other hand it worsens the problem of data sparsity as it leads to high lexical diversity and variation in word order. In this thesis, we aim to address this trade off for accurate and robust parsing of morphologically rich Indian languages. We present novel strategies to effectively represent morphology in the parsing models and also to mitigate the effect of its trade-offs. We propose to represent morphosyntactic information as higher-order features under the Markovian assumption. More specifically, we use the history of a transition-based parser to extract and propagate morphological information such as case and grammatical agreement as higher-order features for parsing nominal nodes. Despite its benefits, rich morphology can also pose a multitude of challenges to statistical parsing. The most prominent issue is related to sampling bias towards canonical structures of a language. As current parsers are mostly trained on formal texts, even a slight deviation from canonical word order can severely affect their performance. To overcome this bias, we propose a sampling technique to generate training instances with diverse word orders from the available canonical structures. We show that linearly interpolated models trained on diverse views of the same data can effectively parse both canonical and non-canonical texts. Similarly, to mitigate the effect of lexical sparsity, we use supervised domain adaptation techniques for training parsers on lexically more diverse annotations from augmented Hindi and Urdu treebanks. Furthermore, we explore lexical semantics as a viable alternative to more training data for parsing semantically rich but sparse dependency annotations in Indian language treebanks.
dc.format.extentxiii, 145
dc.languageEnglish
dc.relation
dc.rightsself
dc.titleExploiting Linguistic Knowledge to Address Representation and Sparsity Issues in Dependency Parsing of Indian Languages
dc.title.alternative
dc.creator.researcherRiyaz Ahmad Bhat
dc.subject.keywordArc Eager Algorithm
dc.subject.keywordDynamic Oracle
dc.subject.keywordMultilayered Perceptron
dc.subject.keywordMultinomial Naive Bayes
dc.subject.keywordNoisy Channel Model
dc.subject.keywordNon-projectivity
dc.subject.keywordSyntactic Parsing
dc.subject.keywordSyntactic Treebanks
dc.subject.keywordTransition Systems
dc.subject.keywordVanilla Perceptron
dc.description.note
dc.contributor.guideProf. Dipti Misra Sharma
dc.publisher.placeHyderabad
dc.publisher.universityInternational Institute of Information Technology, Hyderabad
dc.publisher.institutionComputational Linguistics
dc.date.registered1-8-2009
dc.date.completed07/01/2017
dc.date.awarded31/07/2017
dc.format.dimensions
dc.format.accompanyingmaterialNone
dc.source.universityUniversity
dc.type.degreePh.D.
Appears in Departments:Computational Linguistics

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File43.49 kBAdobe PDFView/Open
02_copyright.pdf17.75 kBAdobe PDFView/Open
03_certificate.pdf22.07 kBAdobe PDFView/Open
04_acknowledgements.pdf26.82 kBAdobe PDFView/Open
05_abstract.pdf24.55 kBAdobe PDFView/Open
06_contents.pdf43.55 kBAdobe PDFView/Open
07_list of table and figures.pdf47.79 kBAdobe PDFView/Open
08_chapter 1.pdf94.84 kBAdobe PDFView/Open
09_chapter 2.pdf120.46 kBAdobe PDFView/Open
10_chapter 3.pdf163.74 kBAdobe PDFView/Open
11_chapter 4.pdf376.43 kBAdobe PDFView/Open
12_chapter 5.pdf179.79 kBAdobe PDFView/Open
13_chapter 6.pdf1.11 MBAdobe PDFView/Open
14_chapter 7.pdf462.69 kBAdobe PDFView/Open
15_chapter 8.pdf281.6 kBAdobe PDFView/Open
16_chapter 9.pdf35.06 kBAdobe PDFView/Open
17_references.pdf100.88 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: