Design of Countermeasures for Spoofed Speech Detection System

Patel, Tanvina Bhupendrabhai

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/161107

Title:	Design of Countermeasures for Spoofed Speech Detection System
Researcher:	Patel, Tanvina Bhupendrabhai
Guide(s):	Patil, Hemant Arjun
Keywords:	Automatic Speaker Verification systems Spoofed Speech Detection Gaussian Mixture Model Cochlear Filter Cepstral Coefficients and Instantaneous Frequency Subband Autoencoder Linear Prediction Long-Term Prediction Non-Linear Prediction Equal Error Rate
University:	Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
Completed Date:	2017
Abstract:	quotAutomatic Speaker Verification (ASV) systems are vulnerable to speech synthesis and voice conversion techniques due to spoofing attacks. Recently, to encourage the development of anti-spoofing measures or countermeasures for Spoofed Speech Detection (SSD) task, a standardized dataset was provided at the ASV spoof 2015 challenge held at INTERSPEECH 2015. In the present work, using a traditional Gaussian Mixture Model (GMM)-based classification system, novel countermeasures are proposed considering three vital aspects of speech production mechanism, i.e., excitation source, vocal tract system (i.e., filter) and Source-Filter (S-F) interaction. newline newlineConsidering our relatively best performance at the ASV spoof challenge, we first discuss system-based features that include proposed Cochlear Filter Cepstral Coefficients and Instantaneous Frequency (CFCCIF) features. These use the envelope and average IF of each subband along with the transient information. The transient variations estimated by the symmetric difference (CFCCIFS) gave better discrimination. Within the framework of system-based features, the Subband Autoencoder (SBAE) feature set that embeds subband processing in the Autoencoder architecture is used. For source-based features, knowing that an actual vocal fold movement is absent in machine-generated speech, fundamental frequency (F0) contour and Strength of Excitation (SoE) are used as features. Next, as spoofed speech is easily predicted if generated by a simplified model or difficult to predict due to artifacts, we propose the use of prediction-based methods. This includes the Linear Prediction (LP), Long-Term Prediction (LTP) and Non-Linear Prediction (NLP) techniques. Lastly, the Fujisaki Model is used to analyze the prosodic differences in terms of accent and phrase between natural and spoofed speech. In addition to independently using source or system features, the time-varying dependencies or the S-F interaction features are considered.
Pagination:	xxx, 225 p.
URI:	http://hdl.handle.net/10603/161107
Appears in Departments:	Department of Information and Communication Technology

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	115.71 kB	Adobe PDF	View/Open
02_declaration and certificate.pdf		92.18 kB	Adobe PDF	View/Open
03_acknowledgements.pdf		135.74 kB	Adobe PDF	View/Open
04_table of content.pdf		116.45 kB	Adobe PDF	View/Open
05_abstract.pdf		99.09 kB	Adobe PDF	View/Open
06_list of principal symbols and acronyms.pdf		120.11 kB	Adobe PDF	View/Open
07_list of figures.pdf		148.05 kB	Adobe PDF	View/Open
08_list of tables.pdf		144.29 kB	Adobe PDF	View/Open
09_chapter 1.pdf		573.81 kB	Adobe PDF	View/Open
10_chapter 2.pdf		446.72 kB	Adobe PDF	View/Open
11_chapter 3.pdf		1.41 MB	Adobe PDF	View/Open
12_chapter 4.pdf		2.63 MB	Adobe PDF	View/Open
13_chapter 5.pdf		2.62 MB	Adobe PDF	View/Open
14_chapter 6.pdf		1.33 MB	Adobe PDF	View/Open
15_chapter 7.pdf		166.17 kB	Adobe PDF	View/Open
16_reference.pdf		218.56 kB	Adobe PDF	View/Open
17_publication.pdf		107.84 kB	Adobe PDF	View/Open
18_biography.pdf		157.13 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET