Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/451025
Title: Searching And Counting Of K Mers Of High Throughput Sequencing Data
Researcher: Manekar, Swati C.
Guide(s): Sathe, Shailesh R.
Keywords: Computer Science
Computer Science Information Systems
Engineering and Technology
University: Visvesvaraya National Institute of Technology
Completed Date: 2019
Abstract: Abstract newlineThe capability to obtain the genetic code of any species has caused a revolution in biological newlinesciences. The fact that a DNA molecule is written as a string has profound implications newlinefor its analysis. DNA sequencing technologies are able to sequence lots of short stretches newline(pieces/fragments) of DNA rather than the entire genomic DNA. These short stretches of newlineDNA describe the sequence of bases in the genome of species. This information is the key newlineto understand many of the aspects of how life functions. Computer science is applied to newlineanalyze DNA sequencing data [1]. newlineThe rapid development of High-Throughput Sequencing (HTS) technologies means newlinethat hundreds of gigabytes of sequencing data can be produced in a single study. Many newlinebioinformatics tools require counts of substrings of length k in DNA/RNA sequencing newlinereads obtained for applications such as genome and transcriptome assembly, error correction, newlinemultiple sequence alignment, and repeat detection. Although k-mer counting is simple newlineand straightforward, it becomes challenging when billions of reads generated by Next newlineGeneration Sequencing (NGS) techniques must be processed using reasonable amounts newlineof memory and in minimal time. There are many k-mer counting programs available for newlineDNA sequencing data, thus it is important for biologists, the beginners, and the consultants newlineto know which program is the best to use for their sequencing application and for available newlinecomputer resources. We present a comprehensive survey of exact k-mer counting newlineprograms. We assess several k-mer counting programs and evaluated their relative performance, newlineprimarily on the basis of runtime and memory usage. We also consider additional newlineparameters, such as disk usage, accuracy, parallelism, the impact of compressed input, newlineperformance in terms of counting large k values and the scalability of the application to larger datasets. We also present various shell scripts and a program to evaluate their performances.We make specific recommendations for the set-up of a current state-of-the-art
Pagination: 204
URI: http://hdl.handle.net/10603/451025
Appears in Departments:Computer Science

Files in This Item:
File Description SizeFormat 
80_recommendation.pdfAttached File115.48 kBAdobe PDFView/Open
abstract.pdf65.27 kBAdobe PDFView/Open
annexure.pdf548.74 kBAdobe PDFView/Open
chapter-1.pdf1.85 MBAdobe PDFView/Open
chapter-2.pdf1.78 MBAdobe PDFView/Open
chapter-3.pdf681.81 kBAdobe PDFView/Open
chapter-4.pdf322.6 kBAdobe PDFView/Open
chapter-5.pdf83.78 kBAdobe PDFView/Open
contents.pdf80.24 kBAdobe PDFView/Open
prelim page.pdf147.54 kBAdobe PDFView/Open
title page.pdf106.9 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: