Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/338495
Title: Topic and sub topic detection using text mining
Researcher: Elakiya, E
Guide(s): Rajkumar, N
Keywords: Engineering and Technology
Computer Science
Computer Science Information Systems
Text Mining
Document Processing
University: Anna University
Completed Date: 2020
Abstract: In every moment, there is a massive volume of data and information shared through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document Processing period is still not capable of extracting information as a human reader. Moreover, the importance of content in the document may also vary from one reader to another. The problems of existing Topic Detection and Optimization Algorithms are fixed number of topics, non-hierarchical, designed for continuous problems, domain variable is not finite, dependent sequence of random decision and takes more time for repeated fitness function evaluation and in each iteration probability distribution can change for complex and large problem. The goal of the thesis is to automatically detect topic and subtopic of a corpus in an eminent way. Spider based topic analysis model helps to shift through large sets of data and identity the most frequent topics in a very simple, fast and scalable way. This electronic era deals with large volumes of unstructured text every day. E-mail, social media post, customer feedback, reviews and other information can benefit greatly from this topic analysis technique. The proposed work is focused to design preprocessing framework for Text Mining application. Three major preprocessing tasks like Expansion, Removal and Tokenization (ERT) takes corpus as an input and generate the list of tokens The rising complexity of electronic data has motivated to propose one Optimization Algorithm. Spider Hunting Algorithm has been proposed due to the inspiration given by insect spider s social behavior, dynamic moments with communication and hunting. Spider Hunting Algorithm helps to design Topic model and detect Topic and Subtopic of a corpus. Spider model is trained using BBC news datasets with different topics like business, entertainment, politics, sports and technology. The BBC (dataset contains 510 business articles, 386 entertainment articles, 417 politics news content, 511 sports contents and 401 technology articles totally 2225 news contents). List of unique and most influencing words is identified for each Topic and Subtopic and it will be maintained in the top of the list. Spider Hunting Algorithm reads corpus sentence by sentence and preprocessed using ERT framework. Token frequencies are calculated then detect topic and subtopic. Enhanced Spider Hunting Algorithm has been proposed based on WordNet to improve the topic detection percentage. It depends on the relationship among tokens and association can be drawn between words using synset newline
Pagination: xiv,121 p.
URI: http://hdl.handle.net/10603/338495
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File126.42 kBAdobe PDFView/Open
02_certificates.pdf122.58 kBAdobe PDFView/Open
03_vivaproceedings.pdf309.22 kBAdobe PDFView/Open
04_bonafidecertificate.pdf178.4 kBAdobe PDFView/Open
05_abstracts.pdf214.01 kBAdobe PDFView/Open
06_acknowledgements.pdf224.56 kBAdobe PDFView/Open
07_contents.pdf365.02 kBAdobe PDFView/Open
08_listoftables.pdf215.86 kBAdobe PDFView/Open
09_listoffigures.pdf143 kBAdobe PDFView/Open
10_listofabbreviations.pdf140.64 kBAdobe PDFView/Open
11_chapter1.pdf371.74 kBAdobe PDFView/Open
12_chapter2.pdf296.66 kBAdobe PDFView/Open
13_chapter3.pdf474.98 kBAdobe PDFView/Open
14_chapter4.pdf473.64 kBAdobe PDFView/Open
15_chapter5.pdf1.14 MBAdobe PDFView/Open
16_chapter6.pdf537.89 kBAdobe PDFView/Open
17_conclusion.pdf146.3 kBAdobe PDFView/Open
18_references.pdf256.05 kBAdobe PDFView/Open
19_listofpublications.pdf222.46 kBAdobe PDFView/Open
80_recommendation.pdf151.24 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: