Content Based Multi Language Plagiarism Detection Tool Using Bayesian Classifier

SRIRAM, M

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/314508

Title:	Content Based Multi Language Plagiarism Detection Tool Using Bayesian Classifier
Researcher:	SRIRAM, M
Guide(s):	SURESH, R M
Keywords:	Computer Science Computer Science Information Systems Engineering and Technology
University:	Bharath University
Completed Date:	2018
Abstract:	Plagiarism is a serious concern in research world. It is a main focus of academician working in research. Students tend to copy text or programming assignments from one another since it eases their task. Copying reduces the time and effort involved in thinking and formulating the logic of programs, the process of coding and debugging. Plagiarism, however, increases the effort of evaluators considerably. In this research, a two-phase architecture is proposed which combines methods from both the categories to incorporate their advantages and overcome their limitations when applied independently. The approaches are Multi-language Plagiarism and content based Plagiarism. newlinePlagiarism, the repetition of taking somebody work or thoughts and passing them off as one s own without suitable acknowledgement of the unique author, has become a serious issue in world. Plagiarism can occur in various forms and in various fields. Plagiarism can be defined in many ways by identifying the causes, sources, and types of plagiarism in written text as well as in programming languages. Plagiarism in text can occur by direct copying of text from another source without referring to the original author, by replacing words with their synonyms, by paraphrasing, by changing the voice of the statement etc. newlineCross-Language Plagiarism Detection is used to automatically discover and extract plagiarism amongst files in exceptional languages. The fundamental assignment of pass- language plagiarism detection is the difference of text languages, in which the authentic supply may be translated and analysed, and plagiarism may be detected routinely with the aid of evaluating suspected text with the authentic text. newlineThis report proposes a bayesian score based probability detection between n-gram corpus and n-gram suspicious document to automatically detect the semantic relatedness between the words of two suspect targeted records. The projected technique contains four methodologies. The first module is a translation of targeted file, the second module involves pre-processing, the third module used n-gram formulation of targeted file and the fourth module is the bayesian scoring of targeted file against the corpus text. The bayesian scoring works well for multi-language plagiarism detection and it has the ability for the content based plagiarism detection. The experimental results are tested for unigram, bigram and trigram models. The best results are acquired for low level word n-grams (n = {2,3}). newline newline
Pagination:
URI:	http://hdl.handle.net/10603/314508
Appears in Departments:	Department of Information Technology

Files in This Item:

File	Description	Size	Format
80_recommendation.pdf	Attached File	270.11 kB	Adobe PDF	View/Open
certificate.pdf		267.06 kB	Adobe PDF	View/Open
chapter 1.pdf		217.91 kB	Adobe PDF	View/Open
chapter 2.pdf		446.61 kB	Adobe PDF	View/Open
chapter 3.pdf		237.18 kB	Adobe PDF	View/Open
chapter 4.pdf		547.93 kB	Adobe PDF	View/Open
chapter 5.pdf		893.72 kB	Adobe PDF	View/Open
chapter 6.pdf		173.23 kB	Adobe PDF	View/Open
peliminary pages.pdf		616.02 kB	Adobe PDF	View/Open
references.pdf		340.73 kB	Adobe PDF	View/Open
title page.pdf		108.38 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET