Design and Development of Efficient Approach For Template Detection and Extraction From Heterogeneous Web Pages

Harish Kumar

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/565503

Title:	Design and Development of Efficient Approach For Template Detection and Extraction From Heterogeneous Web Pages
Researcher:	Harish Kumar
Guide(s):	Praveen Kumar
Keywords:	Engineering and Technology
University:	Glocal University
Completed Date:	2023
Abstract:	In today s world, The Web is a worldwide data space with billions of website pages that can be gotten to by means of Web. World Wide Web is the most useful source of information. The web sites are designed with common templates and contents. The template is used to access the content easily by consistent structures even the templates are not explicitly announced. For improvement of productivity of publishing the web pages in many websites are automatically populated by using the common templates with contents. The simple and unique templates provide easy access to readers to the contents directed by consistent structures. However the current template extraction techniques are degrading the performance of web applications such as search engine due to irrelevant terms in templates. In recent years, many researchers have tried to enhance the performance of template detection and extraction methodology due to increase the performance of web applications. In web applications templates are created for many web sites to increase the productivity and search time of web pages. The templates have consistent structure so that users can easily access the information on the web sites. Templates will be utilized for characterizing Auxiliary data of various zones, for example, web applications zones, biometric regions, computerized system, Programming Languages. Template extraction from heterogeneous web pages can be done by constructing Document Object Model (DOM) tree of HTML document and finding essential paths of document. But due to large variety of web documents, there is a need to manage unknown number of templates. The template detection and extraction techniques are used in heterogeneous web pages. Cluster the documents based on the template used in the web pages, also extract the data used in the web pages. By using this web pages are fully studied and their contents are compared and extracted. Although extracting templates from heterogeneous web pages needs large time to extract and detect. Therefore, time and cost
Pagination:	All Pages
URI:	http://hdl.handle.net/10603/565503
Appears in Departments:	Glocal School of Science and Technology

Files in This Item:

File	Description	Size	Format
80_recommendation.pdf	Attached File	1.63 MB	Adobe PDF	View/Open
abstract.pdf		106.82 kB	Adobe PDF	View/Open
annexures.pdf		1.17 MB	Adobe PDF	View/Open
chapter 1.pdf		210.44 kB	Adobe PDF	View/Open
chapter 2.pdf		193.81 kB	Adobe PDF	View/Open
chapter 3.pdf		200.71 kB	Adobe PDF	View/Open
chapter 4.pdf		115.82 kB	Adobe PDF	View/Open
chapter 5.pdf		124.63 kB	Adobe PDF	View/Open
chapter 6.pdf		1.35 MB	Adobe PDF	View/Open
content.pdf		107.73 kB	Adobe PDF	View/Open
pleg.pdf		1.42 MB	Adobe PDF	View/Open
title.pdf		294.28 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET