Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/286620
Title: | Novel Methods for Summarizing The Contents of Web Page |
Researcher: | Maya John |
Guide(s): | Jayasudha J. S |
Keywords: | Engineering and Technology,Computer Science,Computer Science Interdisciplinary Applications |
University: | Noorul Islam Centre for Higher Education |
Completed Date: | 07/11/2018 |
Abstract: | ABSTRACT newlineThe web has emerged as the largest repository of information in the world. This led to an newlineincreased demand for developing systems which enhance the browsing experience of the newlineusers. The contents in the web page are in the form of text, images, audio and video. This newlinediversification in the information present in web pages pose challenges in summarizing the newlinecontents of web pages. The non-uniformity in the presentation of web page contents make newlineweb content mining a cumbersome task. newlineMost existing methods extract the important content of web page by identifying the newlinemost significant blocks. The removal of non-significant blocks may lead to loss of some newlinevital information which is present in those blocks. Hence Hyper Text Markup Language tag newlineanalysis based strategy has been used to summarize the contents of web page. The contents newlineof web page are summarized through identification of representative image of web page, newlineremoving noise present in the web page and reducing the textual content. The functionalities newlineof images are taken into account for extracting the representative image of web page. The newlinesteps involved in identifying the most significant image in a web page are extracting the newlineimages of web page, classifying images, computing image score, ranking images and finding newlinethe representative image. The use of enhanced keyword set, scoring strategy, grouping and newlineranking of attributes led to better performance of the newly developed representative image newlineextractor when compared to the existing system. newlineWeb pages contain information which are not pertaining to the main theme of the newlineweb page. These type of information are considered as local noise and are to be removed newlineto enhance the performance of web mining. Noise pattern matching based method is newlineused to remove web page noises such as advertisements, search panels, unwanted links, newlinebackground images, plug-ins audio, video and copyright information. The efficiency newlineof image advertisement removal is analysed in terms of precision, recall and F-Score. newlineHyperlinks present in |
Pagination: | 136 |
URI: | http://hdl.handle.net/10603/286620 |
Appears in Departments: | Department of Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
acknowledgement.pdf | Attached File | 88.57 kB | Adobe PDF | View/Open |
certificate.pdf | 72.19 kB | Adobe PDF | View/Open | |
chapter1.pdf | 6.19 MB | Adobe PDF | View/Open | |
chapter2.pdf | 9.35 MB | Adobe PDF | View/Open | |
chapter3.pdf | 3.99 MB | Adobe PDF | View/Open | |
chapter4.pdf | 3.4 MB | Adobe PDF | View/Open | |
chapter5.pdf | 13.73 MB | Adobe PDF | View/Open | |
chapter6.pdf | 885.07 kB | Adobe PDF | View/Open | |
references.pdf | 1.43 MB | Adobe PDF | View/Open | |
title page.pdf | 64.86 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: