Efficient Unsupervised Learning Technique Based Automatic Text Categorization

Jain, Deepti

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/368806

Title:	Efficient Unsupervised Learning Technique Based Automatic Text Categorization
Researcher:	Jain, Deepti
Guide(s):	Jain, R.C. and Verma, Bhupendra
Keywords:	Computer Science Computer Science Information Systems Engineering and Technology
University:	Rajiv Gandhi Proudyogiki Vishwavidyalaya
Completed Date:	2013
Abstract:	Automatic text categorization can play an important role in a wide variety of more newlinetlexible. dynamic and personalized information management tasks such as real-time newlinesorting of email or files into folder hierarchies; topic identification to support topicspecific newlineprocessing operations; structured search and/or browsing; or finding documents newlinethat match long-term standing interests or more dynamic task-based interests. newlineIn many contexts, textual information is a more important communication data in newlineWorld Wide Web which is employed to categorize new knowledge by the trained newlineprofessionals. This process is very time consuming and costly, thus limiting its newlineapplicability. Consequently there is increased interest in developing technologies for newlineautomatic text categorization. newlineThe main focus of this research work is to study the problem of automatic newlinetext categorization and to develop efficient unsupervised learning technique based text newlinecategorization mechanism. In this thesis, an attempt is made to overcome the challenges newlineof the various classifiers in terms of learning speed, real-time classification speed, and newlineaccuracy. Three new algorithms are implements and results are analyzed to see the newlineperformance of these algorithms using two different types of datasets DS0 and DS1 (20- newlineNewsgroups, and Reuters-21578 WebPages). The performance evaluations of the newlineproposed algorithms are done on different combinations of classifiers (Naïve Bayes and newlineJ48) and datasets (DS0 and DS1). newlineThe first algorithm describes a novel unsupervised learning based approach newlinewhich uses frequent item (term) sets for text clustering for reducing drastically the newlinedimensionality of the data. All the way through the performance analysis, it provides newlinehetter accuracy of classilication as compared to direct classification.
Pagination:	11.6MB
URI:	http://hdl.handle.net/10603/368806
Appears in Departments:	Computer Science Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	261.71 kB	Adobe PDF	View/Open
02_declarations.pdf		277.3 kB	Adobe PDF	View/Open
03_certificate.pdf		190.52 kB	Adobe PDF	View/Open
04 _acknowledgement.pdf		1.05 MB	Adobe PDF	View/Open
05 _ content.pdf		1.1 MB	Adobe PDF	View/Open
06 _list of tables.pdf		258.88 kB	Adobe PDF	View/Open
07 _ chapter 1.pdf		1.07 MB	Adobe PDF	View/Open
08 _chapter 2.pdf		877.82 kB	Adobe PDF	View/Open
09 _ chapter 3.pdf		382.76 kB	Adobe PDF	View/Open
10 _ a chapter 5.pdf		1.2 MB	Adobe PDF	View/Open
10 _ b chapter 6.pdf		1.26 MB	Adobe PDF	View/Open
10 _ c chapter 7.pdf		163.03 kB	Adobe PDF	View/Open
10 _ chapter 4.pdf		1.29 MB	Adobe PDF	View/Open
11_ bibliography.pdf		851.95 kB	Adobe PDF	View/Open
11_ list of publicatins.pdf		279.85 kB	Adobe PDF	View/Open
80_recommendation.pdf		468.43 kB	Adobe PDF	View/Open
_abstract.pdf		468.43 kB	Adobe PDF	View/Open
_ list of abbreviations.pdf		254.71 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET