A novel approach for duplicate elimination and effective topic modeling for document clustering

Uma R

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/343259

Full metadata record

DC Field	Value	Language
dc.coverage.spatial	A novel approach for duplicate elimination and effective topic modeling for document clustering
dc.date.accessioned	2021-10-06T05:09:39Z	-
dc.date.available	2021-10-06T05:09:39Z	-
dc.identifier.uri	http://hdl.handle.net/10603/343259	-
dc.description.abstract	Information available on the web increases at a fast pace, within the last two years, there has been an explosive growth of internet information. A great amount of information available in the web is textual information. Textual information plays a vital part in IR and it is probably the most useful information. Searching the Web is becoming dominant due to the fact of richness in information available and convenience in getting the information required .Web search is rooted towards Information Retrieval (IR) which is a study that assists users in finding the required information from a large corpus of documents. The documents in the web are called WebPages. Relevancy and efficiency are the ultimate issues in web search. WebPages are semi-structured in nature. The content in a page is organized and presented in multiple structured blocks. Some blocks contain vital information and others are not. Detecting the main content blocks actively from a webpage is useful in searching the web because terms that are found in those blocks are more important. Users face a great difficulty in identifying the relevant information. The existing approaches need to improve the accuracy in terms of relevancy. Information retrieval is a way to separate relevant data from the irrelevant. Documents on the web are available in different formats. Conventional information retrieval methods operate on clean text, if there is noise in the data it has to be cleaned for efficient retrieval. This research work takes an initiate to increase the retrieval accuracy, relevancy and increase the performance of retrieval for text documents. To attain these goals the search space has to be reduced and the underlying semantics need to be identified. newline
dc.format.extent	xviii, 151p.
dc.language	English
dc.relation	p.140-150
dc.rights	university
dc.title	A novel approach for duplicate elimination and effective topic modeling for document clustering
dc.title.alternative
dc.creator.researcher	Uma R
dc.subject.keyword	Engineering and Technology
dc.subject.keyword	Computer Science
dc.subject.keyword	Computer Science Information Systems
dc.subject.keyword	Document Clustering
dc.subject.keyword	Duplicate Elimination
dc.subject.keyword	Information Retrieval
dc.subject.keyword	Sub Topic Model
dc.description.note
dc.contributor.guide	Latha B
dc.publisher.place	Chennai
dc.publisher.university	Anna University
dc.publisher.institution	Faculty of Information and Communication Engineering
dc.date.registered
dc.date.completed	2020
dc.date.awarded	2020
dc.format.dimensions	21cm
dc.format.accompanyingmaterial	None
dc.source.university	University
dc.type.degree	Ph.D.
Appears in Departments:	Faculty of Information and Communication Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	24.12 kB	Adobe PDF	View/Open
02_certificates.pdf		563.93 kB	Adobe PDF	View/Open
03_abstracts.pdf		14.14 kB	Adobe PDF	View/Open
04_acknowledgements.pdf		456.88 kB	Adobe PDF	View/Open
05_contents.pdf		15.32 kB	Adobe PDF	View/Open
06_listoftables.pdf		10.04 kB	Adobe PDF	View/Open
07_listoffigures.pdf		16.59 kB	Adobe PDF	View/Open
08_listofabbreviations.pdf		12.33 kB	Adobe PDF	View/Open
09_chapter1.pdf		434.78 kB	Adobe PDF	View/Open
10_chapter2.pdf		466.36 kB	Adobe PDF	View/Open
11_chapter3.pdf		511.47 kB	Adobe PDF	View/Open
12_chapter4.pdf		1.73 MB	Adobe PDF	View/Open
13_chapter5.pdf		1.13 MB	Adobe PDF	View/Open
14_conclusion.pdf		22.45 kB	Adobe PDF	View/Open
15_references.pdf		557.88 kB	Adobe PDF	View/Open
16_listofpublications.pdf		326.01 kB	Adobe PDF	View/Open
80_recommendation.pdf		82.65 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET