Novel Methods for Summarizing The Contents of Web Page

Maya John

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/286620

Title:	Novel Methods for Summarizing The Contents of Web Page
Researcher:	Maya John
Guide(s):	Jayasudha J. S
Keywords:	Engineering and Technology,Computer Science,Computer Science Interdisciplinary Applications
University:	Noorul Islam Centre for Higher Education
Completed Date:	07/11/2018
Abstract:	ABSTRACT newlineThe web has emerged as the largest repository of information in the world. This led to an newlineincreased demand for developing systems which enhance the browsing experience of the newlineusers. The contents in the web page are in the form of text, images, audio and video. This newlinediversification in the information present in web pages pose challenges in summarizing the newlinecontents of web pages. The non-uniformity in the presentation of web page contents make newlineweb content mining a cumbersome task. newlineMost existing methods extract the important content of web page by identifying the newlinemost significant blocks. The removal of non-significant blocks may lead to loss of some newlinevital information which is present in those blocks. Hence Hyper Text Markup Language tag newlineanalysis based strategy has been used to summarize the contents of web page. The contents newlineof web page are summarized through identification of representative image of web page, newlineremoving noise present in the web page and reducing the textual content. The functionalities newlineof images are taken into account for extracting the representative image of web page. The newlinesteps involved in identifying the most significant image in a web page are extracting the newlineimages of web page, classifying images, computing image score, ranking images and finding newlinethe representative image. The use of enhanced keyword set, scoring strategy, grouping and newlineranking of attributes led to better performance of the newly developed representative image newlineextractor when compared to the existing system. newlineWeb pages contain information which are not pertaining to the main theme of the newlineweb page. These type of information are considered as local noise and are to be removed newlineto enhance the performance of web mining. Noise pattern matching based method is newlineused to remove web page noises such as advertisements, search panels, unwanted links, newlinebackground images, plug-ins audio, video and copyright information. The efficiency newlineof image advertisement removal is analysed in terms of precision, recall and F-Score. newlineHyperlinks present in
Pagination:	136
URI:	http://hdl.handle.net/10603/286620
Appears in Departments:	Department of Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
acknowledgement.pdf	Attached File	88.57 kB	Adobe PDF	View/Open
certificate.pdf		72.19 kB	Adobe PDF	View/Open
chapter1.pdf		6.19 MB	Adobe PDF	View/Open
chapter2.pdf		9.35 MB	Adobe PDF	View/Open
chapter3.pdf		3.99 MB	Adobe PDF	View/Open
chapter4.pdf		3.4 MB	Adobe PDF	View/Open
chapter5.pdf		13.73 MB	Adobe PDF	View/Open
chapter6.pdf		885.07 kB	Adobe PDF	View/Open
references.pdf		1.43 MB	Adobe PDF	View/Open
title page.pdf		64.86 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET