Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/22920
Title: | Duplicate Record Detection Using Soft Computing Approaches |
Researcher: | Deepa K |
Guide(s): | Rangarajan R |
Keywords: | ANFIS detection Duplicate Record Kmeans algorithm Soft Computing |
Upload Date: | 19-Aug-2014 |
University: | Anna University |
Completed Date: | n.d. |
Abstract: | The abundance of data produced and the requirement to merge newlinethem from different sources have resulted in the challenge of making the newlineefficient detection of the duplicate records in databases Since the data sources newlineare independent they may adopt their own conventions and often integrating newlinedata from different sources invariably leads to erroneous duplication of data newlineTo ensure high quality data the database must validate and cleanse the newlineincoming data from the external sources In this regard data cleaning has newlinebecome essential to ensure the quality of the data stored in the real world newlinedatabases The process of identifying the record pairs that represent the same newlineentity is commonly known as duplicate record detection This is one of the newlineimportant tasks of data cleaning newlineThe proposed work suggests several new approaches to improve the newlineaccuracy of the duplicate record detection process along with other wellknown newlinemeasures The first part of the work adopts Adaptive Neuro Fuzzy newlineInference Systems ANFIS for duplicate record detection by means of newlinesimilarity functions It is not only to reduce the time taken for making the newlinedecision for duplicate detection but also to reduce the time to hard code the newlinematching rules that ANFIS used It is necessary to adapt a similarity measure newlinefor each field of the database with respect to the particular data domain for newlineattaining the accurate similarity computations Consequently the proposed newlineapproach combines these similarity values obtained from different similarity newlinemeasures to compute the distance between any two records newlineIn traditional approach each record is selected and compared with newlinethe rest of the tuples one by one making it a time consuming process In order newlineto reduce the computational time the cleaned records are clustered by Kmeans newlinealgorithm by grouping the records most likely to be duplicates in one newlinegroup Thus all the possible pairs from each cluster are selected comparing newlineonly the records within each cluster ANFIS uses similarities for newlinecomparing a pair of records and detecting the dupl |
Pagination: | xix ,175p |
URI: | http://hdl.handle.net/10603/22920 |
Appears in Departments: | Faculty of Information and Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 251.01 kB | Adobe PDF | View/Open |
02_certificate.pdf | 3.76 MB | Adobe PDF | View/Open | |
03_abstract.pdf | 64.94 kB | Adobe PDF | View/Open | |
04_acknowledgement.pdf | 57.24 kB | Adobe PDF | View/Open | |
05_contents.pdf | 134.55 kB | Adobe PDF | View/Open | |
06_chapter 1.pdf | 301.81 kB | Adobe PDF | View/Open | |
07_chapter 2.pdf | 2.7 MB | Adobe PDF | View/Open | |
08_chapter 3.pdf | 1.19 MB | Adobe PDF | View/Open | |
09_chapter 4.pdf | 1.06 MB | Adobe PDF | View/Open | |
10_chapter 5.pdf | 3.35 MB | Adobe PDF | View/Open | |
11_chapter 6.pdf | 1.96 MB | Adobe PDF | View/Open | |
12_chapter 7.pdf | 65.78 kB | Adobe PDF | View/Open | |
13_appendix.pdf | 83.91 kB | Adobe PDF | View/Open | |
14_references.pdf | 131.24 kB | Adobe PDF | View/Open | |
15_publications.pdf | 59.77 kB | Adobe PDF | View/Open | |
16_vitae.pdf | 53.37 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).