Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/576766
Title: | Decoding the DNA of code an AI infused approach to detect code cloning in software systems |
Researcher: | Mehrotra, Nikita |
Guide(s): | Purandare, Rahul |
Keywords: | Computer Science Computer Science Information Systems Engineering and Technology |
University: | Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi) |
Completed Date: | 2024 |
Abstract: | Code clones, duplicate code fragments sharing similar syntax or semantics, have become increasingly prevalent due to the success of software management tools like GitHub and advancements in Open Source Software (OSS). Previous research has shown that an astonishing 70% of the code hosted on GitHub consists of clones derived from previously existing files. Furthermore, research has also found that between 9% and 31% of software projects on Github contain a substantial portion, sometimes up to 80%, of files that have identical counterparts elsewhere. While clones facilitate code reuse and refactoring, they simultaneously complicate software evolution, necessitating effective clone detection techniques. Historically, substantial amount of research has been conducted on code clone detection, most traditional approaches focus on syntactic clones by leveraging lexical and syntactic information. However, only a few of them target semantic clones. Furthermore, the evolution of software engineering has led to the development of modern multilingual software from traditional mono-language systems, where functionality replication across multiple programming languages is common. This results in clones having similar functionality but belonging to different languages. Since such code snippets are syntactically unrelated, traditional single-language clone detection approaches are not feasible for their detection. Motivated by the success of deep learning models in various domains, researchers have explored deep learning techniques for code clone detection. These techniques leverage the power of machine learning to learn the underlying patterns and features of code to measure code similarity. However, the majority of these techniques rely on supervised learning, which necessitates a substantial volume of labeled data to achieve optimal performance. The acquisition and creation of such labeled datasets present considerable challenges, as they involve not only the scarcity of accurately labeled examples but also the laborious and |
Pagination: | 201 p. |
URI: | http://hdl.handle.net/10603/576766 |
Appears in Departments: | Department of Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01-title.pdf | Attached File | 54.99 kB | Adobe PDF | View/Open |
02_prelim pages.pdf | 381 kB | Adobe PDF | View/Open | |
03_content.pdf | 51.34 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 60.27 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 3.52 MB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 10.92 MB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 89.83 kB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 3.18 MB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 8.03 MB | Adobe PDF | View/Open | |
10_annexures.pdf | 125.86 kB | Adobe PDF | View/Open | |
11_chapter 6.pdf | 705.15 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 144.46 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).