Searching And Counting Of K Mers Of High Throughput Sequencing Data

Manekar, Swati C.

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/451025

Title:	Searching And Counting Of K Mers Of High Throughput Sequencing Data
Researcher:	Manekar, Swati C.
Guide(s):	Sathe, Shailesh R.
Keywords:	Computer Science Computer Science Information Systems Engineering and Technology
University:	Visvesvaraya National Institute of Technology
Completed Date:	2019
Abstract:	Abstract newlineThe capability to obtain the genetic code of any species has caused a revolution in biological newlinesciences. The fact that a DNA molecule is written as a string has profound implications newlinefor its analysis. DNA sequencing technologies are able to sequence lots of short stretches newline(pieces/fragments) of DNA rather than the entire genomic DNA. These short stretches of newlineDNA describe the sequence of bases in the genome of species. This information is the key newlineto understand many of the aspects of how life functions. Computer science is applied to newlineanalyze DNA sequencing data [1]. newlineThe rapid development of High-Throughput Sequencing (HTS) technologies means newlinethat hundreds of gigabytes of sequencing data can be produced in a single study. Many newlinebioinformatics tools require counts of substrings of length k in DNA/RNA sequencing newlinereads obtained for applications such as genome and transcriptome assembly, error correction, newlinemultiple sequence alignment, and repeat detection. Although k-mer counting is simple newlineand straightforward, it becomes challenging when billions of reads generated by Next newlineGeneration Sequencing (NGS) techniques must be processed using reasonable amounts newlineof memory and in minimal time. There are many k-mer counting programs available for newlineDNA sequencing data, thus it is important for biologists, the beginners, and the consultants newlineto know which program is the best to use for their sequencing application and for available newlinecomputer resources. We present a comprehensive survey of exact k-mer counting newlineprograms. We assess several k-mer counting programs and evaluated their relative performance, newlineprimarily on the basis of runtime and memory usage. We also consider additional newlineparameters, such as disk usage, accuracy, parallelism, the impact of compressed input, newlineperformance in terms of counting large k values and the scalability of the application to larger datasets. We also present various shell scripts and a program to evaluate their performances.We make specific recommendations for the set-up of a current state-of-the-art
Pagination:	204
URI:	http://hdl.handle.net/10603/451025
Appears in Departments:	Computer Science

Files in This Item:

File	Description	Size	Format
80_recommendation.pdf	Attached File	115.48 kB	Adobe PDF	View/Open
abstract.pdf		65.27 kB	Adobe PDF	View/Open
annexure.pdf		548.74 kB	Adobe PDF	View/Open
chapter-1.pdf		1.85 MB	Adobe PDF	View/Open
chapter-2.pdf		1.78 MB	Adobe PDF	View/Open
chapter-3.pdf		681.81 kB	Adobe PDF	View/Open
chapter-4.pdf		322.6 kB	Adobe PDF	View/Open
chapter-5.pdf		83.78 kB	Adobe PDF	View/Open
contents.pdf		80.24 kB	Adobe PDF	View/Open
prelim page.pdf		147.54 kB	Adobe PDF	View/Open
title page.pdf		106.9 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET