Design and Analysis of Similarity Measures for Text Document Clustering

Afzali Maedeh

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/329822

Title:	Design and Analysis of Similarity Measures for Text Document Clustering
Researcher:	Afzali Maedeh
Guide(s):	Suresh Kumar
Keywords:	Computer Science Computer Science Information Systems Design and Analysis of Similarity Measures Document Clustering Engineering and Technology
University:	Manav Rachna International University
Completed Date:	2020
Abstract:	The advent of computers and mobile devices aroused a revolution in the realm of newlineinformation technology. Digitization of almost everything around us has led to the newlinegeneration of various forms of data types including structured and unstructured. newlineHowever, the investigations carried out by researchers have proved that a considerable newlineportion of data is in text form. Therefore, it is significant to explore and analyze this newlineunstructured data to extract relevant information and transform it into high-quality newlinestructured information. An efficient organization of this kind of data has become newlineanother necessity for today s vast repositories. One of the techniques that is beneficial newlinein this case is clustering, where its primary purpose is to automatically assess the similarity between the text documents and accordingly group them in a way that the ones together possess a strong internal similarity. The usefulness of the text document newlineclustering can be viewed in many areas, for instance, optimization of the indexing and newlineretrieval operations for systematic browsing, organization of the results obtained from newlinesearch engines, summarization of the corpus of documents, and categorization of the newlinecustomer comments for collaborative recommendations. newlineAbove all, in the text document clustering process, the similarity and dissimilarity newline(distance) measures are considered as the heart of every clustering algorithm. Basically, newlinethe success or failure of a clustering algorithm is highly related to the potentiality of the measure that is employed to evaluate how much the pair of documents are similar or far apart from each other. To this end, determining appropriate measures results in the achievement of accurate outcomes. newlineIn this thesis, the work begins with the study and performance evaluation of various similarity and dissimilarity measures that are suitable to be used by clustering algorithms for grouping the text documents.
Pagination:
URI:	http://hdl.handle.net/10603/329822
Appears in Departments:	Department of Computer Science Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	18.05 kB	Adobe PDF	View/Open
02_certificate.pdf		98.34 kB	Adobe PDF	View/Open
03_acknowledgement.pdf		10.61 kB	Adobe PDF	View/Open
04_content.pdf		38.73 kB	Adobe PDF	View/Open
05_list of graph and table.pdf		258.59 kB	Adobe PDF	View/Open
06_chapter 1.pdf		827.41 kB	Adobe PDF	View/Open
07_chapter 2.pdf		360.11 kB	Adobe PDF	View/Open
08_chapter 3.pdf		2.51 MB	Adobe PDF	View/Open
09_chapter 4.pdf		2 MB	Adobe PDF	View/Open
10_chapter 5.pdf		2.28 MB	Adobe PDF	View/Open
11_chapter 6.pdf		308.2 kB	Adobe PDF	View/Open
12_references.pdf		565.67 kB	Adobe PDF	View/Open
13_similarity verification report.pdf		2.74 MB	Adobe PDF	View/Open
80_recommendation.pdf		176.53 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET