Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/454049
Title: Web activity analysis using sequential pattern mining
Researcher: Bhuvaneswari M S
Guide(s): Muneeswaran K
Keywords: Sequential Pattern Mining
Clustering
Web Log Mining
University: Anna University
Completed Date: 2021
Abstract: Sequence of web pages visited by the clients over a particular newlinetimeframe is called the session/pageset. Web log mining is done to analyze the newlinebehavior of the users, using the web access patterns. Sessions are identified as newlinethe significant part of the construction of recommendation model. The novel part newlineof the work makes use of backward moves made by the user, considering both newlinethe referrer url and the requested url extracted from the extended web log for newlinesession identification. The length of the sessions are maximized using split and newlinemerge technique and the time taken for session identification is reduced using newlinethread parallelization. For efficient storage and retrieval of information the hash newlinemap data structure is used. The proposed approach outperforms the existing newlineapproach in terms of standard error and correlation coefficient. newlineTwo different users may have sessions with a similar set of pages newlinevisited, but the interest with which they have visited the web pages may be newlinedifferent. By augmenting the pages with interest of the users, the clustered newlinesessions can be used for recommending the pages that are of actual interest to newlinethe users. The initial number of clusters is identified based on the Discounted newlineFuzzy Relational Clustering (DFRC) algorithm which reduces the overhead of newlinesetting the number of clusters. A non-euclidean distance metric is used to newlinedetermine how close the sessions are and clustering the sessions. newlineAn approach for identifying the frequent pagesets from the sequence newlinedatabase without candidate generation is proposed. The sequence hashmap is newlineused for finding the support of the sequences without scanning the entire newlinedatabase. The ordered sequence position hashmap is used for constructing newlinelength-(k+1) sequences from length-k sequences in a faster manner compared to newlineother approaches. A model based collaborative filtering is proposed to newlinerecommend the pages of interest to the user. newline
Pagination: xiv,136p.
URI: http://hdl.handle.net/10603/454049
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File28.55 kBAdobe PDFView/Open
02_prelim pages.pdf1.39 MBAdobe PDFView/Open
03_content.pdf10.01 kBAdobe PDFView/Open
04_abstract.pdf9.67 kBAdobe PDFView/Open
05_chapter 1.pdf127.09 kBAdobe PDFView/Open
06_chapter 2.pdf451.17 kBAdobe PDFView/Open
07_chapter 3.pdf203.03 kBAdobe PDFView/Open
08_chapter 4.pdf304.92 kBAdobe PDFView/Open
09_chapter 5.pdf311.2 kBAdobe PDFView/Open
10_annexures.pdf76.42 kBAdobe PDFView/Open
80_recommendation.pdf125.99 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: