Automatic Identificatio of Email Document using Text Mining

Shroff, Namrata

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/537731

Title:	Automatic Identificatio of Email Document using Text Mining
Researcher:	Shroff, Namrata
Guide(s):	Shingala, Amisha
Keywords:	Engineering Engineering and Technology Engineering Industrial
University:	Gujarat Technological University
Completed Date:	2022
Abstract:	quotThe research work presents the mechanism for the automatic classification of email documents. The research work is carried out with the mixed approach of a combination of Natural Language Processing and Computational Linguistics as per the demand of the research problem. In this research work, the primary focus is on email (document) topic distribution and keyword topic distribution. newline newlineThe hidden knowledge from the email corpus is collected and used to retrieve the topics/ labels for the emails. Unfortunately, due to the increase in the number of emails in the inbox sometimes proper management and organization are difficult and so the important emails remain unattended. This research work tried to generate the labels in front of the subject of the email and also create a folder in Gmail inbox and store the email in the folder. Label and folder creation will be for important emails. Emails from different domains (e.g. doctor s emails, advocate emails, medical representative emails, teaching professional emails, etc.)were studied, and after collecting, identifying, and manually validating the rules through the manual calculations of each rule found from the different sources, the knowledge corpus is created to make it usable for research purposes. newline newlineFurther, the construction rules for knowledge corpus are rule-based modeled, through which the detection and identification of the labels for emails take place. Along with that, stop words filtering is also incorporated. Apart from this, the noun and verbs are detected from the subject and body of email through the NV-LDA (Noun Verb Latent Dirichlet Allocation) to understand the keywords better. The automatically generated metadata concerning computational linguistics includes details about noun-verb from subject and body of email corpus much more metadata, which is matched with the knowledge corpus and the labels for email are predicted. This research work also contributes to the knowledge corpus creation of label prediction for different users from a different doma
Pagination:	4297 KB
URI:	http://hdl.handle.net/10603/537731
Appears in Departments:	Computer/IT Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	181.42 kB	Adobe PDF	View/Open
03_abstract.pdf		132.68 kB	Adobe PDF	View/Open
06_contents.pdf		584.77 kB	Adobe PDF	View/Open
10_chapter1.pdf		431.9 kB	Adobe PDF	View/Open
11_chapter2.pdf		446.01 kB	Adobe PDF	View/Open
12_chapter3.pdf		813.35 kB	Adobe PDF	View/Open
13_chapter4.pdf		1.08 MB	Adobe PDF	View/Open
14_chapter5.pdf		186.42 kB	Adobe PDF	View/Open
15_conclusion.pdf		137.72 kB	Adobe PDF	View/Open
17_biblography.pdf		176.8 kB	Adobe PDF	View/Open
80_recommendation.pdf		756.88 kB	Adobe PDF	View/Open
prelim pages.pdf		316.29 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET