Please use this identifier to cite or link to this item:
Title: Duplicate Record Detection Using Soft Computing Approaches
Researcher: Deepa K
Guide(s): Rangarajan R
Keywords: ANFIS
Duplicate Record
Kmeans algorithm
Soft Computing
Upload Date: 19-Aug-2014
University: Anna University
Completed Date: n.d.
Abstract: The abundance of data produced and the requirement to merge newlinethem from different sources have resulted in the challenge of making the newlineefficient detection of the duplicate records in databases Since the data sources newlineare independent they may adopt their own conventions and often integrating newlinedata from different sources invariably leads to erroneous duplication of data newlineTo ensure high quality data the database must validate and cleanse the newlineincoming data from the external sources In this regard data cleaning has newlinebecome essential to ensure the quality of the data stored in the real world newlinedatabases The process of identifying the record pairs that represent the same newlineentity is commonly known as duplicate record detection This is one of the newlineimportant tasks of data cleaning newlineThe proposed work suggests several new approaches to improve the newlineaccuracy of the duplicate record detection process along with other wellknown newlinemeasures The first part of the work adopts Adaptive Neuro Fuzzy newlineInference Systems ANFIS for duplicate record detection by means of newlinesimilarity functions It is not only to reduce the time taken for making the newlinedecision for duplicate detection but also to reduce the time to hard code the newlinematching rules that ANFIS used It is necessary to adapt a similarity measure newlinefor each field of the database with respect to the particular data domain for newlineattaining the accurate similarity computations Consequently the proposed newlineapproach combines these similarity values obtained from different similarity newlinemeasures to compute the distance between any two records newlineIn traditional approach each record is selected and compared with newlinethe rest of the tuples one by one making it a time consuming process In order newlineto reduce the computational time the cleaned records are clustered by Kmeans newlinealgorithm by grouping the records most likely to be duplicates in one newlinegroup Thus all the possible pairs from each cluster are selected comparing newlineonly the records within each cluster ANFIS uses similarities for newlinecomparing a pair of records and detecting the dupl
Pagination: xix ,175p
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File251.01 kBAdobe PDFView/Open
02_certificate.pdf3.76 MBAdobe PDFView/Open
03_abstract.pdf64.94 kBAdobe PDFView/Open
04_acknowledgement.pdf57.24 kBAdobe PDFView/Open
05_contents.pdf134.55 kBAdobe PDFView/Open
06_chapter 1.pdf301.81 kBAdobe PDFView/Open
07_chapter 2.pdf2.7 MBAdobe PDFView/Open
08_chapter 3.pdf1.19 MBAdobe PDFView/Open
09_chapter 4.pdf1.06 MBAdobe PDFView/Open
10_chapter 5.pdf3.35 MBAdobe PDFView/Open
11_chapter 6.pdf1.96 MBAdobe PDFView/Open
12_chapter 7.pdf65.78 kBAdobe PDFView/Open
13_appendix.pdf83.91 kBAdobe PDFView/Open
14_references.pdf131.24 kBAdobe PDFView/Open
15_publications.pdf59.77 kBAdobe PDFView/Open
16_vitae.pdf53.37 kBAdobe PDFView/Open

Items in Shodhganga are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: