Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/254348
Title: | Improved weightage approaches for data dependency and classification in web documents |
Researcher: | Santhanakumar M |
Guide(s): | Christopher Columbus C |
Keywords: | Data Dependency Engineering and Technology,Computer Science,Computer Science Interdisciplinary Applications Web Documents Weightage Approaches |
University: | Anna University |
Completed Date: | 2018 |
Abstract: | In recent years, World Wide Web (WWW) has become only the largest and most important source for all kind of information. Web mining is one of the applications in data mining that is widely spread in various fields like science, business, medical, engineering, banking, etc. Hence, many researchers are working and developing various algorithms and techniques to deal with different issues in web mining. However, there are some challenging tasks in the field of web mining to extract the optimal and valuable information from the web. Term weighting plays an important role in retrieving the documents from the web. So, several term weighting methods have been proposed for assigning weight to the documents based on the occurrences of the terms. Term Frequency-Inverse Document Frequency (TF-IDF) is the most frequently used term weighting method in the field of Information Retrieval. The main objective of this thesis is to implement an efficient and effective term weighting method based on classical TF-IDF method for text classification. To improve the retrieval accuracy of the web documents, this thesis investigates two different term weighting methods namely Improved TF-IDF (ImpTF-IDF) and Co-Term Frequency (CTF) method. ImpTF-IDF is a single term based term weighting method which is an extension of classical TF formula. The idea behind this term weighting method is that the average frequency of a term in a collection of documents is considered for assigning weight to that term. The ratio between the lengths of the document to the total number of distinct terms in a corpus is found for normalization. newline |
Pagination: | xxi, 187p. |
URI: | http://hdl.handle.net/10603/254348 |
Appears in Departments: | Faculty of Information and Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 24.75 kB | Adobe PDF | View/Open |
02_certificates.pdf | 476.27 kB | Adobe PDF | View/Open | |
03_abstract.pdf | 6.81 kB | Adobe PDF | View/Open | |
04_acknowledgement.pdf | 5.34 kB | Adobe PDF | View/Open | |
05_table of contents.pdf | 107.72 kB | Adobe PDF | View/Open | |
06_list_of_symbols and abbreviations.pdf | 257.59 kB | Adobe PDF | View/Open | |
07_chapter1.pdf | 278.93 kB | Adobe PDF | View/Open | |
08_chapter2.pdf | 199.64 kB | Adobe PDF | View/Open | |
09_chapter3.pdf | 415.44 kB | Adobe PDF | View/Open | |
10_chapter4.pdf | 516.76 kB | Adobe PDF | View/Open | |
11_chapter5.pdf | 549.3 kB | Adobe PDF | View/Open | |
12_chapter6.pdf | 659 kB | Adobe PDF | View/Open | |
13_chapter7.pdf | 325.68 kB | Adobe PDF | View/Open | |
14_conclusion.pdf | 438.13 kB | Adobe PDF | View/Open | |
15_references.pdf | 249.06 kB | Adobe PDF | View/Open | |
16_list_of_publications.pdf | 127.32 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: