Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/565503
Title: | Design and Development of Efficient Approach For Template Detection and Extraction From Heterogeneous Web Pages |
Researcher: | Harish Kumar |
Guide(s): | Praveen Kumar |
Keywords: | Engineering and Technology |
University: | Glocal University |
Completed Date: | 2023 |
Abstract: | In today s world, The Web is a worldwide data space with billions of website pages that can be gotten to by means of Web. World Wide Web is the most useful source of information. The web sites are designed with common templates and contents. The template is used to access the content easily by consistent structures even the templates are not explicitly announced. For improvement of productivity of publishing the web pages in many websites are automatically populated by using the common templates with contents. The simple and unique templates provide easy access to readers to the contents directed by consistent structures. However the current template extraction techniques are degrading the performance of web applications such as search engine due to irrelevant terms in templates. In recent years, many researchers have tried to enhance the performance of template detection and extraction methodology due to increase the performance of web applications. In web applications templates are created for many web sites to increase the productivity and search time of web pages. The templates have consistent structure so that users can easily access the information on the web sites. Templates will be utilized for characterizing Auxiliary data of various zones, for example, web applications zones, biometric regions, computerized system, Programming Languages. Template extraction from heterogeneous web pages can be done by constructing Document Object Model (DOM) tree of HTML document and finding essential paths of document. But due to large variety of web documents, there is a need to manage unknown number of templates. The template detection and extraction techniques are used in heterogeneous web pages. Cluster the documents based on the template used in the web pages, also extract the data used in the web pages. By using this web pages are fully studied and their contents are compared and extracted. Although extracting templates from heterogeneous web pages needs large time to extract and detect. Therefore, time and cost |
Pagination: | All Pages |
URI: | http://hdl.handle.net/10603/565503 |
Appears in Departments: | Glocal School of Science and Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
80_recommendation.pdf | Attached File | 1.63 MB | Adobe PDF | View/Open |
abstract.pdf | 106.82 kB | Adobe PDF | View/Open | |
annexures.pdf | 1.17 MB | Adobe PDF | View/Open | |
chapter 1.pdf | 210.44 kB | Adobe PDF | View/Open | |
chapter 2.pdf | 193.81 kB | Adobe PDF | View/Open | |
chapter 3.pdf | 200.71 kB | Adobe PDF | View/Open | |
chapter 4.pdf | 115.82 kB | Adobe PDF | View/Open | |
chapter 5.pdf | 124.63 kB | Adobe PDF | View/Open | |
chapter 6.pdf | 1.35 MB | Adobe PDF | View/Open | |
content.pdf | 107.73 kB | Adobe PDF | View/Open | |
pleg.pdf | 1.42 MB | Adobe PDF | View/Open | |
title.pdf | 294.28 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: