Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/448117
Title: Developing bug prediction and summarization models for software bug reports using text analytics and machine learning techniques
Researcher: SHUBHRA GOYAL
Guide(s): Arvinder Kaur
Keywords: Computer Science
Computer Science Information Systems
Engineering and Technology
University: Guru Gobind Singh Indraprastha University
Completed Date: 2020
Abstract: As a preliminary work, information extraction technique is applied on software bugs to excerpt useful information. To extract information and to improve the data analysis process, no standard datasets are available. Therefore, a tool named Bug Report Collection System (BRCS) is developed which extracts bugs of various Apache projects of Jira issue tracking repository in a predefined structural format. The structured format consists of several attributes such as BugId, Bug Resolution Status, Severity of a bug, One-line description, Component they belong to, developer assigned, fix version and others. Among them, severity of a bug report is a decisive attribute which decides how instantly the bug should be fixed. Therefore, predicting the severity of a software bug is of utmost importance. For this, Categorization technique of text mining is applied. The severity of a bug is categorized into two categories: severe and non-severe. To predict the severity, one-line description of software bugs is used and machine learning techniques are applied. It is demonstrated that Boosting and random Forests attains more accurate and generalizable results. To validate the results, statistical tests are performed. Further, it was observed that performance of machine learning techniques is afflicted by the v imbalanced nature of datasets. To overcome and resolve this issue, class imbalance problem is solved using data sampling approach Synthetic Minority Oversampling technique . It is demonstrated that machine learning techniques attains improved results on balanced datasets. The performance of machine learning techniques is evaluated on the basis of two performance metrics: Geometric Mean and Balance.
Pagination: 202
URI: http://hdl.handle.net/10603/448117
Appears in Departments:University School of Information and Communication Technology

Files in This Item:
File Description SizeFormat 
80_recommendation.pdfAttached File409.59 kBAdobe PDFView/Open
shubhra goyal.pdf3.59 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: