Document clustering using a fuzzy representation of clusters

Thaoroijam, Kabita

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/69683

Full metadata record

DC Field	Value	Language
dc.coverage.spatial	Computer Science
dc.date.accessioned	2016-01-08T11:00:26Z	-
dc.date.available	2016-01-08T11:00:26Z	-
dc.identifier.uri	http://hdl.handle.net/10603/69683	-
dc.description.abstract	In the last decades, the volume of text databases has rapidly grown due to the increasing amount of information available in electronic forms, such as WWW, emails, newsgroup messages, Internet news feeds, digital libraries, etc. Clustering can help in organizing the text collection for efficient browsing and searching. This has been a driving force for making clustering is a highly active research area. Document clustering is a subset of the larger field of data clustering, which borrows concepts from the fields of Information Retrieval (IR), Natural Language Processing (NLP), and Machine Learning (ML), among others. In this thesis, we propose a new document clustering algorithm where the concepts of fuzzy sets have been used. The proposed algorithm is agglomerative and at any given stage of the algorithm there are small clusters and the decision at the current stage is to merge the incoming document with the cluster that statisfies a user specified threshold. The clusters obtained are represented as fuzzy sets over a finite universal set which provides a compact representation of clusters. A similarity measure based on the fuzzy representation of the clusters is defined. The algorithm requires just one pass through the dataset and only the compact representations of the clusters are kept in the memory at any given time. Our algorithm is incremental and can deal with the dynamic nature of real world data. With arbitrarily large datasets, the datasets cannot fit in memory. Several clustering algorithms are proposed for large datasets which follow a two-phase approach. We propose a two-phase approach to the clustering problem of large dataset. In the first phase, a single pass over the database is used to produce an in-memory summary of the data set. In the second phase, the in-memory summary of the data set obtained in the previous phase is merged based on the concepts of neighbors and links.
dc.format.extent
dc.language	English
dc.relation
dc.rights	university
dc.title	Document clustering using a fuzzy representation of clusters
dc.title.alternative
dc.creator.researcher	Thaoroijam, Kabita
dc.subject.keyword	Algorithm
dc.subject.keyword	Clustering
dc.subject.keyword	Complexity
dc.subject.keyword	Document
dc.subject.keyword	Fuzzy
dc.subject.keyword	Pearson
dc.subject.keyword	Similarity
dc.description.note	Data not available
dc.contributor.guide	Mahanta, Anjana Kakoti
dc.publisher.place	Guwahati
dc.publisher.university	Gauhati University
dc.publisher.institution	Department of Computer Science and Application
dc.date.registered	n.d.
dc.date.completed	31/12/2009
dc.date.awarded	n.d.
dc.format.dimensions
dc.format.accompanyingmaterial	None
dc.source.university	University
dc.type.degree	Ph.D.
Appears in Departments:	Department of Computer Science and Application

Files in This Item:

File	Description	Size	Format
01_title page.pdf	Attached File	30.4 kB	Adobe PDF	View/Open
02_certificate.pdf		23.23 kB	Adobe PDF	View/Open
03_declaration.pdf		14.83 kB	Adobe PDF	View/Open
04_content.pdf		66.13 kB	Adobe PDF	View/Open
05_acknowledgement.pdf		28.32 kB	Adobe PDF	View/Open
06_abstract.pdf		46.82 kB	Adobe PDF	View/Open
07_list of tables.pdf		11.84 kB	Adobe PDF	View/Open
08_list of figures.pdf		9.99 kB	Adobe PDF	View/Open
09_list of abbreviation.pdf		18.54 kB	Adobe PDF	View/Open
10_chapter 1.pdf		364.43 kB	Adobe PDF	View/Open
11_chapter 2.pdf		933.76 kB	Adobe PDF	View/Open
12_chapter 3.pdf		252.2 kB	Adobe PDF	View/Open
13_chapter 4.pdf		363.42 kB	Adobe PDF	View/Open
14_chapter 5.pdf		467.56 kB	Adobe PDF	View/Open
15_conclusions and further works.pdf		114.1 kB	Adobe PDF	View/Open
16_appendix a.pdf		99.35 kB	Adobe PDF	View/Open
17_bibliography.pdf		325.36 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET