Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/507883
Title: Inducing Constraints in Paraphrase Generation and Consistency in Paraphrase Detection
Researcher: Kumar, Ashutosh
Guide(s): Talukdar, Partha Pratim
Keywords: Computer Science
Computer Science Software Engineering
Engineering and Technology
University: Indian Institute of Science Bangalore
Completed Date: 2023
Abstract: Deep learning models typically require a large volume of data. Manual curation of datasets is time-consuming and limited by imagination. As a result, natural language generation (NLG) has been employed to automate the process. However, in their vanilla formulation, NLG model are prone to producing degenerate, uninteresting, and often hallucinated outputs. Constrained generation aims to overcome these shortcomings by providing additional information to the generation process. Training data thus generated can help improve the robustness of deep learning models. Therefore, the central research question of the thesis is: How can we constrain generation models, especially in NLP, to produce meaningful outputs and utilize them for building better classification models? To demonstrate how generation models can be constrained, we present two approaches for paraphrase generation. Paraphrase generation involves the generation of text that conveys the same meaning as a reference text. We propose two strategies for paraphrase generation: (1) DiPS (Diversity in Paraphrases using Submodularity): The first approach deals with constraining paraphrase generation to ensure diversity, i.e., ensuring that generated text(s) are sufficiently different from each other. We propose a decoding algorithm for obtaining diverse texts. We provide a novel formulation of the problem in terms of monotone submodular function maximization, specifically targeted toward the task of paraphrase generation. We demonstrate the effectiveness of our method for data augmentation on multiple tasks such as intent classification and paraphrase recognition. (2) SGCP (Syntax Guided Controlled Paraphraser): The second approach deals with constraining paraphrase generation to ensure syntacticality, i.e., ensuring that the generated text is syntactically coherent with an exemplar sentence. We propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation without compromising relevance (fidelity). Through a bat...
Pagination: 
URI: http://hdl.handle.net/10603/507883
Appears in Departments:Computer Science and Automation

Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: