Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/576766
Title: Decoding the DNA of code an AI infused approach to detect code cloning in software systems
Researcher: Mehrotra, Nikita
Guide(s): Purandare, Rahul
Keywords: Computer Science
Computer Science Information Systems
Engineering and Technology
University: Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi)
Completed Date: 2024
Abstract: Code clones, duplicate code fragments sharing similar syntax or semantics, have become increasingly prevalent due to the success of software management tools like GitHub and advancements in Open Source Software (OSS). Previous research has shown that an astonishing 70% of the code hosted on GitHub consists of clones derived from previously existing files. Furthermore, research has also found that between 9% and 31% of software projects on Github contain a substantial portion, sometimes up to 80%, of files that have identical counterparts elsewhere. While clones facilitate code reuse and refactoring, they simultaneously complicate software evolution, necessitating effective clone detection techniques. Historically, substantial amount of research has been conducted on code clone detection, most traditional approaches focus on syntactic clones by leveraging lexical and syntactic information. However, only a few of them target semantic clones. Furthermore, the evolution of software engineering has led to the development of modern multilingual software from traditional mono-language systems, where functionality replication across multiple programming languages is common. This results in clones having similar functionality but belonging to different languages. Since such code snippets are syntactically unrelated, traditional single-language clone detection approaches are not feasible for their detection. Motivated by the success of deep learning models in various domains, researchers have explored deep learning techniques for code clone detection. These techniques leverage the power of machine learning to learn the underlying patterns and features of code to measure code similarity. However, the majority of these techniques rely on supervised learning, which necessitates a substantial volume of labeled data to achieve optimal performance. The acquisition and creation of such labeled datasets present considerable challenges, as they involve not only the scarcity of accurately labeled examples but also the laborious and
Pagination: 201 p.
URI: http://hdl.handle.net/10603/576766
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01-title.pdfAttached File54.99 kBAdobe PDFView/Open
02_prelim pages.pdf381 kBAdobe PDFView/Open
03_content.pdf51.34 kBAdobe PDFView/Open
04_abstract.pdf60.27 kBAdobe PDFView/Open
05_chapter 1.pdf3.52 MBAdobe PDFView/Open
06_chapter 2.pdf10.92 MBAdobe PDFView/Open
07_chapter 3.pdf89.83 kBAdobe PDFView/Open
08_chapter 4.pdf3.18 MBAdobe PDFView/Open
09_chapter 5.pdf8.03 MBAdobe PDFView/Open
10_annexures.pdf125.86 kBAdobe PDFView/Open
11_chapter 6.pdf705.15 kBAdobe PDFView/Open
80_recommendation.pdf144.46 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).