Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/286620
Title: Novel Methods for Summarizing The Contents of Web Page
Researcher: Maya John
Guide(s): Jayasudha J. S
Keywords: Engineering and Technology,Computer Science,Computer Science Interdisciplinary Applications
University: Noorul Islam Centre for Higher Education
Completed Date: 07/11/2018
Abstract: ABSTRACT newlineThe web has emerged as the largest repository of information in the world. This led to an newlineincreased demand for developing systems which enhance the browsing experience of the newlineusers. The contents in the web page are in the form of text, images, audio and video. This newlinediversification in the information present in web pages pose challenges in summarizing the newlinecontents of web pages. The non-uniformity in the presentation of web page contents make newlineweb content mining a cumbersome task. newlineMost existing methods extract the important content of web page by identifying the newlinemost significant blocks. The removal of non-significant blocks may lead to loss of some newlinevital information which is present in those blocks. Hence Hyper Text Markup Language tag newlineanalysis based strategy has been used to summarize the contents of web page. The contents newlineof web page are summarized through identification of representative image of web page, newlineremoving noise present in the web page and reducing the textual content. The functionalities newlineof images are taken into account for extracting the representative image of web page. The newlinesteps involved in identifying the most significant image in a web page are extracting the newlineimages of web page, classifying images, computing image score, ranking images and finding newlinethe representative image. The use of enhanced keyword set, scoring strategy, grouping and newlineranking of attributes led to better performance of the newly developed representative image newlineextractor when compared to the existing system. newlineWeb pages contain information which are not pertaining to the main theme of the newlineweb page. These type of information are considered as local noise and are to be removed newlineto enhance the performance of web mining. Noise pattern matching based method is newlineused to remove web page noises such as advertisements, search panels, unwanted links, newlinebackground images, plug-ins audio, video and copyright information. The efficiency newlineof image advertisement removal is analysed in terms of precision, recall and F-Score. newlineHyperlinks present in
Pagination: 136
URI: http://hdl.handle.net/10603/286620
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
acknowledgement.pdfAttached File88.57 kBAdobe PDFView/Open
certificate.pdf72.19 kBAdobe PDFView/Open
chapter1.pdf6.19 MBAdobe PDFView/Open
chapter2.pdf9.35 MBAdobe PDFView/Open
chapter3.pdf3.99 MBAdobe PDFView/Open
chapter4.pdf3.4 MBAdobe PDFView/Open
chapter5.pdf13.73 MBAdobe PDFView/Open
chapter6.pdf885.07 kBAdobe PDFView/Open
references.pdf1.43 MBAdobe PDFView/Open
title page.pdf64.86 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: