A novel approach for duplicate elimination and effective topic modeling for document clustering

Uma R

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/343259

Title:	A novel approach for duplicate elimination and effective topic modeling for document clustering
Researcher:	Uma R
Guide(s):	Latha B
Keywords:	Engineering and Technology Computer Science Computer Science Information Systems Document Clustering Duplicate Elimination Information Retrieval Sub Topic Model
University:	Anna University
Completed Date:	2020
Abstract:	Information available on the web increases at a fast pace, within the last two years, there has been an explosive growth of internet information. A great amount of information available in the web is textual information. Textual information plays a vital part in IR and it is probably the most useful information. Searching the Web is becoming dominant due to the fact of richness in information available and convenience in getting the information required .Web search is rooted towards Information Retrieval (IR) which is a study that assists users in finding the required information from a large corpus of documents. The documents in the web are called WebPages. Relevancy and efficiency are the ultimate issues in web search. WebPages are semi-structured in nature. The content in a page is organized and presented in multiple structured blocks. Some blocks contain vital information and others are not. Detecting the main content blocks actively from a webpage is useful in searching the web because terms that are found in those blocks are more important. Users face a great difficulty in identifying the relevant information. The existing approaches need to improve the accuracy in terms of relevancy. Information retrieval is a way to separate relevant data from the irrelevant. Documents on the web are available in different formats. Conventional information retrieval methods operate on clean text, if there is noise in the data it has to be cleaned for efficient retrieval. This research work takes an initiate to increase the retrieval accuracy, relevancy and increase the performance of retrieval for text documents. To attain these goals the search space has to be reduced and the underlying semantics need to be identified. newline
Pagination:	xviii, 151p.
URI:	http://hdl.handle.net/10603/343259
Appears in Departments:	Faculty of Information and Communication Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	24.12 kB	Adobe PDF	View/Open
02_certificates.pdf		563.93 kB	Adobe PDF	View/Open
03_abstracts.pdf		14.14 kB	Adobe PDF	View/Open
04_acknowledgements.pdf		456.88 kB	Adobe PDF	View/Open
05_contents.pdf		15.32 kB	Adobe PDF	View/Open
06_listoftables.pdf		10.04 kB	Adobe PDF	View/Open
07_listoffigures.pdf		16.59 kB	Adobe PDF	View/Open
08_listofabbreviations.pdf		12.33 kB	Adobe PDF	View/Open
09_chapter1.pdf		434.78 kB	Adobe PDF	View/Open
10_chapter2.pdf		466.36 kB	Adobe PDF	View/Open
11_chapter3.pdf		511.47 kB	Adobe PDF	View/Open
12_chapter4.pdf		1.73 MB	Adobe PDF	View/Open
13_chapter5.pdf		1.13 MB	Adobe PDF	View/Open
14_conclusion.pdf		22.45 kB	Adobe PDF	View/Open
15_references.pdf		557.88 kB	Adobe PDF	View/Open
16_listofpublications.pdf		326.01 kB	Adobe PDF	View/Open
80_recommendation.pdf		82.65 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET