Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/423819
Title: | An Efficient Spammer Classification for Ranking of Web Pages |
Researcher: | Makkar, Aaisha |
Guide(s): | Kumar, Neeraj |
Keywords: | Computer Science Computer Science Software Engineering Engineering and Technology Web page ranking |
University: | Thapar Institute of Engineering and Technology |
Completed Date: | 2020 |
Abstract: | Inaccurate search engine result page (SERP) is one of the significant drawbacks of the search engine ranking algorithm. Web spam is one of its primary cause. Although there are many techniques which have detected web spam by analyzing the content features and link features of a web page. These spam detection techniques primarily focused on revising the rank score of a web page after being included in SERP. But, none of these techniques targets at preventing the web spam before assigning a rank by the ranking algorithm. For the successful SERPs, the web pages should be completely spammed free before en- tering into the ranking module. For this purpose, web spam pages should be demoted by the ranking algorithm itself to reduce their rank score. This mechanism should be implemented in such a way that the authoritative web sites get promoted. The complete analysis and study of Google ranking methodology, i.e., PageRank, is done. The various measures affecting the rank score computation in PageRank are investigated. As a result, the primary cause of injection of spam web pages on the web is due to the presence of dangling web pages. Dangling pages are the webpages which do not have hyperlinks. The ratio of dangling pages is increasing due to the documents such as pdf, technical reports from research communities. Spammers create artificial in-links to boost the rank of webpages, but the outgoing links are not focused. Thus, it results in dangling pages. Although a lot of work has been done for handling dangling pages and improving the ranking algorithm. But, none of these has handled dangling pages concerning user behavior analysis. User surfing activities can only predict the real picture of a webpage. Evaluating the importance of dangling page can significantly help in refining the SERPs. This task has been accomplished in this research work with two different approaches. The first approach detects the spam dangling web pages by considering the user behavior attributes, i.e., dwell time and click count. |
Pagination: | xiv, 133p. |
URI: | http://hdl.handle.net/10603/423819 |
Appears in Departments: | Department of Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 1.36 MB | Adobe PDF | View/Open |
02_prelim pages.pdf | 1.5 MB | Adobe PDF | View/Open | |
03_content.pdf | 27.94 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 28.2 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 211.04 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 2.49 MB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 1.14 MB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 612.46 kB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 1.97 MB | Adobe PDF | View/Open | |
10_chapter 6.pdf | 3.73 MB | Adobe PDF | View/Open | |
11_chapter 7.pdf | 29.74 kB | Adobe PDF | View/Open | |
12_annexures.pdf | 93.37 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 1.36 MB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: