Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/254348
Title: Improved weightage approaches for data dependency and classification in web documents
Researcher: Santhanakumar M
Guide(s): Christopher Columbus C
Keywords: Data Dependency
Engineering and Technology,Computer Science,Computer Science Interdisciplinary Applications
Web Documents
Weightage Approaches
University: Anna University
Completed Date: 2018
Abstract: In recent years, World Wide Web (WWW) has become only the largest and most important source for all kind of information. Web mining is one of the applications in data mining that is widely spread in various fields like science, business, medical, engineering, banking, etc. Hence, many researchers are working and developing various algorithms and techniques to deal with different issues in web mining. However, there are some challenging tasks in the field of web mining to extract the optimal and valuable information from the web. Term weighting plays an important role in retrieving the documents from the web. So, several term weighting methods have been proposed for assigning weight to the documents based on the occurrences of the terms. Term Frequency-Inverse Document Frequency (TF-IDF) is the most frequently used term weighting method in the field of Information Retrieval. The main objective of this thesis is to implement an efficient and effective term weighting method based on classical TF-IDF method for text classification. To improve the retrieval accuracy of the web documents, this thesis investigates two different term weighting methods namely Improved TF-IDF (ImpTF-IDF) and Co-Term Frequency (CTF) method. ImpTF-IDF is a single term based term weighting method which is an extension of classical TF formula. The idea behind this term weighting method is that the average frequency of a term in a collection of documents is considered for assigning weight to that term. The ratio between the lengths of the document to the total number of distinct terms in a corpus is found for normalization. newline
Pagination: xxi, 187p.
URI: http://hdl.handle.net/10603/254348
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File24.75 kBAdobe PDFView/Open
02_certificates.pdf476.27 kBAdobe PDFView/Open
03_abstract.pdf6.81 kBAdobe PDFView/Open
04_acknowledgement.pdf5.34 kBAdobe PDFView/Open
05_table of contents.pdf107.72 kBAdobe PDFView/Open
06_list_of_symbols and abbreviations.pdf257.59 kBAdobe PDFView/Open
07_chapter1.pdf278.93 kBAdobe PDFView/Open
08_chapter2.pdf199.64 kBAdobe PDFView/Open
09_chapter3.pdf415.44 kBAdobe PDFView/Open
10_chapter4.pdf516.76 kBAdobe PDFView/Open
11_chapter5.pdf549.3 kBAdobe PDFView/Open
12_chapter6.pdf659 kBAdobe PDFView/Open
13_chapter7.pdf325.68 kBAdobe PDFView/Open
14_conclusion.pdf438.13 kBAdobe PDFView/Open
15_references.pdf249.06 kBAdobe PDFView/Open
16_list_of_publications.pdf127.32 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: