Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/343259
Title: | A novel approach for duplicate elimination and effective topic modeling for document clustering |
Researcher: | Uma R |
Guide(s): | Latha B |
Keywords: | Engineering and Technology Computer Science Computer Science Information Systems Document Clustering Duplicate Elimination Information Retrieval Sub Topic Model |
University: | Anna University |
Completed Date: | 2020 |
Abstract: | Information available on the web increases at a fast pace, within the last two years, there has been an explosive growth of internet information. A great amount of information available in the web is textual information. Textual information plays a vital part in IR and it is probably the most useful information. Searching the Web is becoming dominant due to the fact of richness in information available and convenience in getting the information required .Web search is rooted towards Information Retrieval (IR) which is a study that assists users in finding the required information from a large corpus of documents. The documents in the web are called WebPages. Relevancy and efficiency are the ultimate issues in web search. WebPages are semi-structured in nature. The content in a page is organized and presented in multiple structured blocks. Some blocks contain vital information and others are not. Detecting the main content blocks actively from a webpage is useful in searching the web because terms that are found in those blocks are more important. Users face a great difficulty in identifying the relevant information. The existing approaches need to improve the accuracy in terms of relevancy. Information retrieval is a way to separate relevant data from the irrelevant. Documents on the web are available in different formats. Conventional information retrieval methods operate on clean text, if there is noise in the data it has to be cleaned for efficient retrieval. This research work takes an initiate to increase the retrieval accuracy, relevancy and increase the performance of retrieval for text documents. To attain these goals the search space has to be reduced and the underlying semantics need to be identified. newline |
Pagination: | xviii, 151p. |
URI: | http://hdl.handle.net/10603/343259 |
Appears in Departments: | Faculty of Information and Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 24.12 kB | Adobe PDF | View/Open |
02_certificates.pdf | 563.93 kB | Adobe PDF | View/Open | |
03_abstracts.pdf | 14.14 kB | Adobe PDF | View/Open | |
04_acknowledgements.pdf | 456.88 kB | Adobe PDF | View/Open | |
05_contents.pdf | 15.32 kB | Adobe PDF | View/Open | |
06_listoftables.pdf | 10.04 kB | Adobe PDF | View/Open | |
07_listoffigures.pdf | 16.59 kB | Adobe PDF | View/Open | |
08_listofabbreviations.pdf | 12.33 kB | Adobe PDF | View/Open | |
09_chapter1.pdf | 434.78 kB | Adobe PDF | View/Open | |
10_chapter2.pdf | 466.36 kB | Adobe PDF | View/Open | |
11_chapter3.pdf | 511.47 kB | Adobe PDF | View/Open | |
12_chapter4.pdf | 1.73 MB | Adobe PDF | View/Open | |
13_chapter5.pdf | 1.13 MB | Adobe PDF | View/Open | |
14_conclusion.pdf | 22.45 kB | Adobe PDF | View/Open | |
15_references.pdf | 557.88 kB | Adobe PDF | View/Open | |
16_listofpublications.pdf | 326.01 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 82.65 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: