Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/368806
Title: Efficient Unsupervised Learning Technique Based Automatic Text Categorization
Researcher: Jain, Deepti
Guide(s): Jain, R.C. and Verma, Bhupendra
Keywords: Computer Science
Computer Science Information Systems
Engineering and Technology
University: Rajiv Gandhi Proudyogiki Vishwavidyalaya
Completed Date: 2013
Abstract: Automatic text categorization can play an important role in a wide variety of more newlinetlexible. dynamic and personalized information management tasks such as real-time newlinesorting of email or files into folder hierarchies; topic identification to support topicspecific newlineprocessing operations; structured search and/or browsing; or finding documents newlinethat match long-term standing interests or more dynamic task-based interests. newlineIn many contexts, textual information is a more important communication data in newlineWorld Wide Web which is employed to categorize new knowledge by the trained newlineprofessionals. This process is very time consuming and costly, thus limiting its newlineapplicability. Consequently there is increased interest in developing technologies for newlineautomatic text categorization. newlineThe main focus of this research work is to study the problem of automatic newlinetext categorization and to develop efficient unsupervised learning technique based text newlinecategorization mechanism. In this thesis, an attempt is made to overcome the challenges newlineof the various classifiers in terms of learning speed, real-time classification speed, and newlineaccuracy. Three new algorithms are implements and results are analyzed to see the newlineperformance of these algorithms using two different types of datasets DS0 and DS1 (20- newlineNewsgroups, and Reuters-21578 WebPages). The performance evaluations of the newlineproposed algorithms are done on different combinations of classifiers (Naïve Bayes and newlineJ48) and datasets (DS0 and DS1). newlineThe first algorithm describes a novel unsupervised learning based approach newlinewhich uses frequent item (term) sets for text clustering for reducing drastically the newlinedimensionality of the data. All the way through the performance analysis, it provides newlinehetter accuracy of classilication as compared to direct classification.
Pagination: 11.6MB
URI: http://hdl.handle.net/10603/368806
Appears in Departments:Computer Science Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File261.71 kBAdobe PDFView/Open
02_declarations.pdf277.3 kBAdobe PDFView/Open
03_certificate.pdf190.52 kBAdobe PDFView/Open
04 _acknowledgement.pdf1.05 MBAdobe PDFView/Open
05 _ content.pdf1.1 MBAdobe PDFView/Open
06 _list of tables.pdf258.88 kBAdobe PDFView/Open
07 _ chapter 1.pdf1.07 MBAdobe PDFView/Open
08 _chapter 2.pdf877.82 kBAdobe PDFView/Open
09 _ chapter 3.pdf382.76 kBAdobe PDFView/Open
10 _ a chapter 5.pdf1.2 MBAdobe PDFView/Open
10 _ b chapter 6.pdf1.26 MBAdobe PDFView/Open
10 _ c chapter 7.pdf163.03 kBAdobe PDFView/Open
10 _ chapter 4.pdf1.29 MBAdobe PDFView/Open
11_ bibliography.pdf851.95 kBAdobe PDFView/Open
11_ list of publicatins.pdf279.85 kBAdobe PDFView/Open
80_recommendation.pdf468.43 kBAdobe PDFView/Open
_abstract.pdf468.43 kBAdobe PDFView/Open
_ list of abbreviations.pdf254.71 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: