Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/329822
Title: Design and Analysis of Similarity Measures for Text Document Clustering
Researcher: Afzali Maedeh
Guide(s): Suresh Kumar
Keywords: Computer Science
Computer Science Information Systems
Design and Analysis of Similarity Measures
Document Clustering
Engineering and Technology
University: Manav Rachna International University
Completed Date: 2020
Abstract: The advent of computers and mobile devices aroused a revolution in the realm of newlineinformation technology. Digitization of almost everything around us has led to the newlinegeneration of various forms of data types including structured and unstructured. newlineHowever, the investigations carried out by researchers have proved that a considerable newlineportion of data is in text form. Therefore, it is significant to explore and analyze this newlineunstructured data to extract relevant information and transform it into high-quality newlinestructured information. An efficient organization of this kind of data has become newlineanother necessity for today s vast repositories. One of the techniques that is beneficial newlinein this case is clustering, where its primary purpose is to automatically assess the similarity between the text documents and accordingly group them in a way that the ones together possess a strong internal similarity. The usefulness of the text document newlineclustering can be viewed in many areas, for instance, optimization of the indexing and newlineretrieval operations for systematic browsing, organization of the results obtained from newlinesearch engines, summarization of the corpus of documents, and categorization of the newlinecustomer comments for collaborative recommendations. newlineAbove all, in the text document clustering process, the similarity and dissimilarity newline(distance) measures are considered as the heart of every clustering algorithm. Basically, newlinethe success or failure of a clustering algorithm is highly related to the potentiality of the measure that is employed to evaluate how much the pair of documents are similar or far apart from each other. To this end, determining appropriate measures results in the achievement of accurate outcomes. newlineIn this thesis, the work begins with the study and performance evaluation of various similarity and dissimilarity measures that are suitable to be used by clustering algorithms for grouping the text documents.
Pagination: 
URI: http://hdl.handle.net/10603/329822
Appears in Departments:Department of Computer Science Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File18.05 kBAdobe PDFView/Open
02_certificate.pdf98.34 kBAdobe PDFView/Open
03_acknowledgement.pdf10.61 kBAdobe PDFView/Open
04_content.pdf38.73 kBAdobe PDFView/Open
05_list of graph and table.pdf258.59 kBAdobe PDFView/Open
06_chapter 1.pdf827.41 kBAdobe PDFView/Open
07_chapter 2.pdf360.11 kBAdobe PDFView/Open
08_chapter 3.pdf2.51 MBAdobe PDFView/Open
09_chapter 4.pdf2 MBAdobe PDFView/Open
10_chapter 5.pdf2.28 MBAdobe PDFView/Open
11_chapter 6.pdf308.2 kBAdobe PDFView/Open
12_references.pdf565.67 kBAdobe PDFView/Open
13_similarity verification report.pdf2.74 MBAdobe PDFView/Open
80_recommendation.pdf176.53 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: