A study on de duplication

Venkatesh kumar A

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/39834

Full metadata record

DC Field	Value	Language
dc.coverage.spatial	A study on de duplication	en_US
dc.date.accessioned	2015-04-28T07:13:45Z	-
dc.date.available	2015-04-28T07:13:45Z	-
dc.date.issued	2015-04-28	-
dc.identifier.uri	http://hdl.handle.net/10603/39834	-
dc.description.abstract	We present two algorithms for calculating string Dis Similarity newlinepercentage of De duplication system Our algorithms are multiple levels of newlineclustering to incorporate constraints for reducing the volume of data and newlineInformation Gain IG for calculating Dis Similarity In our proposed system newlinewe will first separate the records into block sized subset by using clustering newlinealgorithm and applying the subset value to IG Most of the existing algorithm newlinesystems depend on generic or manually tuned distance metrics for estimating newlinethe similarity We ran extensive experiments with huge data and compared newlinethem with various versions of existing algorithms and reported that the new newlinesystem reduces the time consumption for string comparison and higher newlineaverage accuracy than that of the existing systems newlineNone of the existing system produces the dis similarity percentage newlinebetween pair of string in given data set Here we have presented an efficient newlinesolution for calculating string dis Similarity percentage of De duplication newlinesystem by using Multi Level Clustering MLC Information Gain Our newlinealgorithms work in two phases Multi Level Clustering construction and Text newlineDis Similarity calculation Our methods reduce the time consumption for newlinefinding a duplicate record and using smaller amount of memory than the newlineexisting method newline newline	en_US
dc.format.extent	xvi, 160p.	en_US
dc.language	English	en_US
dc.relation	p152-159.	en_US
dc.rights	university	en_US
dc.title	A study on de duplication	en_US
dc.title.alternative		en_US
dc.creator.researcher	Venkatesh kumar A	en_US
dc.subject.keyword	Information Gain	en_US
dc.subject.keyword	Multi Level Clustering	en_US
dc.description.note	appendix p137-151, reference p152-159.	en_US
dc.contributor.guide	Kuppuswami S	en_US
dc.publisher.place	Chennai	en_US
dc.publisher.university	Anna University	en_US
dc.publisher.institution	Faculty of Science and Humanities	en_US
dc.date.registered	n.d,	en_US
dc.date.completed	01/04/2014	en_US
dc.date.awarded	30/04/2014	en_US
dc.format.dimensions	23cm.	en_US
dc.format.accompanyingmaterial	None	en_US
dc.source.university	University	en_US
dc.type.degree	Ph.D.	en_US
Appears in Departments:	Faculty of Science and Humanities

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	28.82 kB	Adobe PDF	View/Open
02_certificate.pdf		182.56 kB	Adobe PDF	View/Open
03_abstract.pdf		18.98 kB	Adobe PDF	View/Open
04_acknowledgement.pdf		22.03 kB	Adobe PDF	View/Open
05_content.pdf		44.58 kB	Adobe PDF	View/Open
06_chapter1.pdf		179.69 kB	Adobe PDF	View/Open
07_chapter2.pdf		193.92 kB	Adobe PDF	View/Open
08_chapter3.pdf		32.66 kB	Adobe PDF	View/Open
09_chapter4.pdf		368.8 kB	Adobe PDF	View/Open
10_chapter5.pdf		359.07 kB	Adobe PDF	View/Open
11_chapter6.pdf		227.44 kB	Adobe PDF	View/Open
12_chapter7.pdf		27.32 kB	Adobe PDF	View/Open
13_appendix.pdf		852.74 kB	Adobe PDF	View/Open
14_reference.pdf		44.59 kB	Adobe PDF	View/Open
15_publication.pdf		22.88 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET