Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/338495
Title: | Topic and sub topic detection using text mining |
Researcher: | Elakiya, E |
Guide(s): | Rajkumar, N |
Keywords: | Engineering and Technology Computer Science Computer Science Information Systems Text Mining Document Processing |
University: | Anna University |
Completed Date: | 2020 |
Abstract: | In every moment, there is a massive volume of data and information shared through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document Processing period is still not capable of extracting information as a human reader. Moreover, the importance of content in the document may also vary from one reader to another. The problems of existing Topic Detection and Optimization Algorithms are fixed number of topics, non-hierarchical, designed for continuous problems, domain variable is not finite, dependent sequence of random decision and takes more time for repeated fitness function evaluation and in each iteration probability distribution can change for complex and large problem. The goal of the thesis is to automatically detect topic and subtopic of a corpus in an eminent way. Spider based topic analysis model helps to shift through large sets of data and identity the most frequent topics in a very simple, fast and scalable way. This electronic era deals with large volumes of unstructured text every day. E-mail, social media post, customer feedback, reviews and other information can benefit greatly from this topic analysis technique. The proposed work is focused to design preprocessing framework for Text Mining application. Three major preprocessing tasks like Expansion, Removal and Tokenization (ERT) takes corpus as an input and generate the list of tokens The rising complexity of electronic data has motivated to propose one Optimization Algorithm. Spider Hunting Algorithm has been proposed due to the inspiration given by insect spider s social behavior, dynamic moments with communication and hunting. Spider Hunting Algorithm helps to design Topic model and detect Topic and Subtopic of a corpus. Spider model is trained using BBC news datasets with different topics like business, entertainment, politics, sports and technology. The BBC (dataset contains 510 business articles, 386 entertainment articles, 417 politics news content, 511 sports contents and 401 technology articles totally 2225 news contents). List of unique and most influencing words is identified for each Topic and Subtopic and it will be maintained in the top of the list. Spider Hunting Algorithm reads corpus sentence by sentence and preprocessed using ERT framework. Token frequencies are calculated then detect topic and subtopic. Enhanced Spider Hunting Algorithm has been proposed based on WordNet to improve the topic detection percentage. It depends on the relationship among tokens and association can be drawn between words using synset newline |
Pagination: | xiv,121 p. |
URI: | http://hdl.handle.net/10603/338495 |
Appears in Departments: | Faculty of Information and Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 126.42 kB | Adobe PDF | View/Open |
02_certificates.pdf | 122.58 kB | Adobe PDF | View/Open | |
03_vivaproceedings.pdf | 309.22 kB | Adobe PDF | View/Open | |
04_bonafidecertificate.pdf | 178.4 kB | Adobe PDF | View/Open | |
05_abstracts.pdf | 214.01 kB | Adobe PDF | View/Open | |
06_acknowledgements.pdf | 224.56 kB | Adobe PDF | View/Open | |
07_contents.pdf | 365.02 kB | Adobe PDF | View/Open | |
08_listoftables.pdf | 215.86 kB | Adobe PDF | View/Open | |
09_listoffigures.pdf | 143 kB | Adobe PDF | View/Open | |
10_listofabbreviations.pdf | 140.64 kB | Adobe PDF | View/Open | |
11_chapter1.pdf | 371.74 kB | Adobe PDF | View/Open | |
12_chapter2.pdf | 296.66 kB | Adobe PDF | View/Open | |
13_chapter3.pdf | 474.98 kB | Adobe PDF | View/Open | |
14_chapter4.pdf | 473.64 kB | Adobe PDF | View/Open | |
15_chapter5.pdf | 1.14 MB | Adobe PDF | View/Open | |
16_chapter6.pdf | 537.89 kB | Adobe PDF | View/Open | |
17_conclusion.pdf | 146.3 kB | Adobe PDF | View/Open | |
18_references.pdf | 256.05 kB | Adobe PDF | View/Open | |
19_listofpublications.pdf | 222.46 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 151.24 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: