Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/343259
Full metadata record
DC FieldValueLanguage
dc.coverage.spatialA novel approach for duplicate elimination and effective topic modeling for document clustering
dc.date.accessioned2021-10-06T05:09:39Z-
dc.date.available2021-10-06T05:09:39Z-
dc.identifier.urihttp://hdl.handle.net/10603/343259-
dc.description.abstractInformation available on the web increases at a fast pace, within the last two years, there has been an explosive growth of internet information. A great amount of information available in the web is textual information. Textual information plays a vital part in IR and it is probably the most useful information. Searching the Web is becoming dominant due to the fact of richness in information available and convenience in getting the information required .Web search is rooted towards Information Retrieval (IR) which is a study that assists users in finding the required information from a large corpus of documents. The documents in the web are called WebPages. Relevancy and efficiency are the ultimate issues in web search. WebPages are semi-structured in nature. The content in a page is organized and presented in multiple structured blocks. Some blocks contain vital information and others are not. Detecting the main content blocks actively from a webpage is useful in searching the web because terms that are found in those blocks are more important. Users face a great difficulty in identifying the relevant information. The existing approaches need to improve the accuracy in terms of relevancy. Information retrieval is a way to separate relevant data from the irrelevant. Documents on the web are available in different formats. Conventional information retrieval methods operate on clean text, if there is noise in the data it has to be cleaned for efficient retrieval. This research work takes an initiate to increase the retrieval accuracy, relevancy and increase the performance of retrieval for text documents. To attain these goals the search space has to be reduced and the underlying semantics need to be identified. newline
dc.format.extentxviii, 151p.
dc.languageEnglish
dc.relationp.140-150
dc.rightsuniversity
dc.titleA novel approach for duplicate elimination and effective topic modeling for document clustering
dc.title.alternative
dc.creator.researcherUma R
dc.subject.keywordEngineering and Technology
dc.subject.keywordComputer Science
dc.subject.keywordComputer Science Information Systems
dc.subject.keywordDocument Clustering
dc.subject.keywordDuplicate Elimination
dc.subject.keywordInformation Retrieval
dc.subject.keywordSub Topic Model
dc.description.note
dc.contributor.guideLatha B
dc.publisher.placeChennai
dc.publisher.universityAnna University
dc.publisher.institutionFaculty of Information and Communication Engineering
dc.date.registered
dc.date.completed2020
dc.date.awarded2020
dc.format.dimensions21cm
dc.format.accompanyingmaterialNone
dc.source.universityUniversity
dc.type.degreePh.D.
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File24.12 kBAdobe PDFView/Open
02_certificates.pdf563.93 kBAdobe PDFView/Open
03_abstracts.pdf14.14 kBAdobe PDFView/Open
04_acknowledgements.pdf456.88 kBAdobe PDFView/Open
05_contents.pdf15.32 kBAdobe PDFView/Open
06_listoftables.pdf10.04 kBAdobe PDFView/Open
07_listoffigures.pdf16.59 kBAdobe PDFView/Open
08_listofabbreviations.pdf12.33 kBAdobe PDFView/Open
09_chapter1.pdf434.78 kBAdobe PDFView/Open
10_chapter2.pdf466.36 kBAdobe PDFView/Open
11_chapter3.pdf511.47 kBAdobe PDFView/Open
12_chapter4.pdf1.73 MBAdobe PDFView/Open
13_chapter5.pdf1.13 MBAdobe PDFView/Open
14_conclusion.pdf22.45 kBAdobe PDFView/Open
15_references.pdf557.88 kBAdobe PDFView/Open
16_listofpublications.pdf326.01 kBAdobe PDFView/Open
80_recommendation.pdf82.65 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: