Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/423819
Title: An Efficient Spammer Classification for Ranking of Web Pages
Researcher: Makkar, Aaisha
Guide(s): Kumar, Neeraj
Keywords: Computer Science
Computer Science Software Engineering
Engineering and Technology
Web page ranking
University: Thapar Institute of Engineering and Technology
Completed Date: 2020
Abstract: Inaccurate search engine result page (SERP) is one of the significant drawbacks of the search engine ranking algorithm. Web spam is one of its primary cause. Although there are many techniques which have detected web spam by analyzing the content features and link features of a web page. These spam detection techniques primarily focused on revising the rank score of a web page after being included in SERP. But, none of these techniques targets at preventing the web spam before assigning a rank by the ranking algorithm. For the successful SERPs, the web pages should be completely spammed free before en- tering into the ranking module. For this purpose, web spam pages should be demoted by the ranking algorithm itself to reduce their rank score. This mechanism should be implemented in such a way that the authoritative web sites get promoted. The complete analysis and study of Google ranking methodology, i.e., PageRank, is done. The various measures affecting the rank score computation in PageRank are investigated. As a result, the primary cause of injection of spam web pages on the web is due to the presence of dangling web pages. Dangling pages are the webpages which do not have hyperlinks. The ratio of dangling pages is increasing due to the documents such as pdf, technical reports from research communities. Spammers create artificial in-links to boost the rank of webpages, but the outgoing links are not focused. Thus, it results in dangling pages. Although a lot of work has been done for handling dangling pages and improving the ranking algorithm. But, none of these has handled dangling pages concerning user behavior analysis. User surfing activities can only predict the real picture of a webpage. Evaluating the importance of dangling page can significantly help in refining the SERPs. This task has been accomplished in this research work with two different approaches. The first approach detects the spam dangling web pages by considering the user behavior attributes, i.e., dwell time and click count.
Pagination: xiv, 133p.
URI: http://hdl.handle.net/10603/423819
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File1.36 MBAdobe PDFView/Open
02_prelim pages.pdf1.5 MBAdobe PDFView/Open
03_content.pdf27.94 kBAdobe PDFView/Open
04_abstract.pdf28.2 kBAdobe PDFView/Open
05_chapter 1.pdf211.04 kBAdobe PDFView/Open
06_chapter 2.pdf2.49 MBAdobe PDFView/Open
07_chapter 3.pdf1.14 MBAdobe PDFView/Open
08_chapter 4.pdf612.46 kBAdobe PDFView/Open
09_chapter 5.pdf1.97 MBAdobe PDFView/Open
10_chapter 6.pdf3.73 MBAdobe PDFView/Open
11_chapter 7.pdf29.74 kBAdobe PDFView/Open
12_annexures.pdf93.37 kBAdobe PDFView/Open
80_recommendation.pdf1.36 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: