Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/314508
Title: Content Based Multi Language Plagiarism Detection Tool Using Bayesian Classifier
Researcher: SRIRAM, M
Guide(s): SURESH, R M
Keywords: Computer Science
Computer Science Information Systems
Engineering and Technology
University: Bharath University
Completed Date: 2018
Abstract: Plagiarism is a serious concern in research world. It is a main focus of academician working in research. Students tend to copy text or programming assignments from one another since it eases their task. Copying reduces the time and effort involved in thinking and formulating the logic of programs, the process of coding and debugging. Plagiarism, however, increases the effort of evaluators considerably. In this research, a two-phase architecture is proposed which combines methods from both the categories to incorporate their advantages and overcome their limitations when applied independently. The approaches are Multi-language Plagiarism and content based Plagiarism. newlinePlagiarism, the repetition of taking somebody work or thoughts and passing them off as one s own without suitable acknowledgement of the unique author, has become a serious issue in world. Plagiarism can occur in various forms and in various fields. Plagiarism can be defined in many ways by identifying the causes, sources, and types of plagiarism in written text as well as in programming languages. Plagiarism in text can occur by direct copying of text from another source without referring to the original author, by replacing words with their synonyms, by paraphrasing, by changing the voice of the statement etc. newlineCross-Language Plagiarism Detection is used to automatically discover and extract plagiarism amongst files in exceptional languages. The fundamental assignment of pass- language plagiarism detection is the difference of text languages, in which the authentic supply may be translated and analysed, and plagiarism may be detected routinely with the aid of evaluating suspected text with the authentic text. newlineThis report proposes a bayesian score based probability detection between n-gram corpus and n-gram suspicious document to automatically detect the semantic relatedness between the words of two suspect targeted records. The projected technique contains four methodologies. The first module is a translation of targeted file, the second module involves pre-processing, the third module used n-gram formulation of targeted file and the fourth module is the bayesian scoring of targeted file against the corpus text. The bayesian scoring works well for multi-language plagiarism detection and it has the ability for the content based plagiarism detection. The experimental results are tested for unigram, bigram and trigram models. The best results are acquired for low level word n-grams (n = {2,3}). newline newline
Pagination: 
URI: http://hdl.handle.net/10603/314508
Appears in Departments:Department of Information Technology

Files in This Item:
File Description SizeFormat 
80_recommendation.pdfAttached File270.11 kBAdobe PDFView/Open
certificate.pdf267.06 kBAdobe PDFView/Open
chapter 1.pdf217.91 kBAdobe PDFView/Open
chapter 2.pdf446.61 kBAdobe PDFView/Open
chapter 3.pdf237.18 kBAdobe PDFView/Open
chapter 4.pdf547.93 kBAdobe PDFView/Open
chapter 5.pdf893.72 kBAdobe PDFView/Open
chapter 6.pdf173.23 kBAdobe PDFView/Open
peliminary pages.pdf616.02 kBAdobe PDFView/Open
references.pdf340.73 kBAdobe PDFView/Open
title page.pdf108.38 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: