Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/528118
Title: | Efficient Email Classification System with NLP and Graph Theoretical Approaches |
Researcher: | ARUNA KUMARA B |
Guide(s): | MALLIKARJUN M KODABAGI |
Keywords: | Computer Science Computer Science Information Systems Engineering and Technology |
University: | REVA University |
Completed Date: | 2023 |
Abstract: | Email remains a vital means of communication for academic, personal, and newlineprofessional users, despite the availability of alternatives such as social networks, newlineelectronic messages, and mobile applications. The growing volume of email users newlineresulted in an increase in email data, which in turn demands efficient techniques for newlineemail auto management to save time, handle high-dimensional data, and make email newlinecommunication more user-friendly. Email classification is a critical task in newlineinformation retrieval and natural language processing (NLP) domains, with numerous newlineapplications including content categorization, sentiment analysis, and spam detection. newlineIn this research, various sub systems for the development of email classification newlinesystem based on natural language processing and graph based approaches are newlinedeveloped for the management of university emails. The different research newlinecontributions presented in the thesis are explained in chapter 3 to chapter 5. The newlinecreation of a real-time email corpus - REVA dataset and a novel efficient data preprocessing approach (RES) are presented in chapter 3. Later, a novel feature newlineengineering method based on the Longest Common Subsequence (LCS) to select the newlinemost relevant features is presented in chapter 4. Finally, a graph-based similarity newlinemeasure (GSM) for effective email classification is presented in chapter 5. newlineThe first contribution of this research is the creation of a real-time email corpus called newlineREVA dataset and an efficient data pre-processing approach for email classification. newlineThe REVA dataset is a comprehensive and diverse collection of emails that have been newlinecollected in real-time from different users of the REVA University. The dataset is newlinecarefully curated to include a wide range of email types, such as academics, newlineexamination, placement, research, and other emails, making it suitable for training newlineand evaluating email classification systems. Email data can be noisy and contain newlineirrelevant information, such as HTML tags, email headers, and signatures, which can newlinenegatively imp |
Pagination: | |
URI: | http://hdl.handle.net/10603/528118 |
Appears in Departments: | School of Computing and Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 116.31 kB | Adobe PDF | View/Open |
02_prelim pages.pdf | 622.73 kB | Adobe PDF | View/Open | |
03_contents.pdf | 21.11 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 12.86 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 308.29 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 219.04 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 451.58 kB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 728.18 kB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 310.12 kB | Adobe PDF | View/Open | |
10_chapter 6.pdf | 154.02 kB | Adobe PDF | View/Open | |
11_annexures.pdf | 300.9 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 394.77 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: