Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/336265
Title: An efficient sentiment classification technique in Hadoop framework using optimized tree
Researcher: Sridharan, K
Guide(s): Komarasamy, G and Daniel Madan Raja
Keywords: Engineering and Technology
Computer Science
Computer Science Hardware and Architecture
University: Anna University
Completed Date: 2020
Abstract: The big data analysis requires a fast mining on a large scale data set, i.e., the immense amount of data should be processed in a limited time to show useful information. As the computing power improves, the more volume of date cab be processed. The more data are retrieved and processed; the better understanding of problems can be obtained. The process whereby the subsets of features that are obtained from the data are extorted for a learning algorithm s application is referred to as feature extraction. Classification is the problem identifying to which set of categories a new observation belongs on the basis of training set of data contains observations whose category membership is known. Feature selection can address the curse of dimensionality by selecting only relevant features for classification. By eliminating and reducing irrelevant features and redundant features, feature selection could reduce the number of features, cut down the training time, simplify the learned classifiers and improve the classification performance. While handling big data, Hadoop provides a platform for users in developing their own sentiment analysis with the help of lexicon dictionary or available application programming interfaces (APIs) or external programs. The aim of classifying a data is to analyze large data and develop an appropriate description or model for every organized class with the feature present in the data. This work involves Hadoop framework in obtaining an effective classification with the help of Random Forest (RF) Techniques. Feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF) TECHNIQUE. Term frequency (TF) means the number of times a term appeared in a document. Document frequency (DF) is defined as the number of documents that includes a term. Inverse Document Frequency (IDF) measures the amount of information. The TF-IDF is used to calculate the product of TF and IDF. RF be the parametric supervised classification method which can be considered as a Classification and Regr
Pagination: xiv,126 p.
URI: http://hdl.handle.net/10603/336265
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File132.21 kBAdobe PDFView/Open
02_certificates.pdf682.8 kBAdobe PDFView/Open
03_vivaproceedings.pdf202.87 kBAdobe PDFView/Open
04_bonafidecertificate.pdf396.04 kBAdobe PDFView/Open
05_abstracts.pdf111.24 kBAdobe PDFView/Open
06_acknowledgements.pdf221.03 kBAdobe PDFView/Open
07_contents.pdf232.57 kBAdobe PDFView/Open
08_listoftables.pdf83.77 kBAdobe PDFView/Open
09_listoffigures.pdf140.34 kBAdobe PDFView/Open
10_listofabbreviations.pdf121.13 kBAdobe PDFView/Open
11_chapter1.pdf467.09 kBAdobe PDFView/Open
12_chapter2.pdf352.93 kBAdobe PDFView/Open
13_chapter3.pdf352.62 kBAdobe PDFView/Open
14_chapter4.pdf421.22 kBAdobe PDFView/Open
15_chapter5.pdf293.15 kBAdobe PDFView/Open
16_conclusion.pdf167.89 kBAdobe PDFView/Open
17_references.pdf290.24 kBAdobe PDFView/Open
18_listofpublications.pdf152.64 kBAdobe PDFView/Open
80_recommendation.pdf189.61 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: