Algorithms for Stochastic Optimization Statistical Estimation and Markov Decision Processes

Kamanchi, Chandramouli

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/426451

Title:	Algorithms for Stochastic Optimization Statistical Estimation and Markov Decision Processes
Researcher:	Kamanchi, Chandramouli
Guide(s):	Bhatnagar, Shalabh
Keywords:	Automation and Control Systems Computer Science Engineering and Technology
University:	Indian Institute of Science Bangalore
Completed Date:	2020
Abstract:	Stochastic approximation deals with the problem of finding zeros of a function expressed as an expectation of a random variable. In this thesis we propose convergent algorithms for problems in optimization, statistical estimation and reinforcement learning. We first formulate these problems in the stochastic approximation setting. Subsequently we utilize the ordinary differential equations based analysis of stochastic approximation algorithms to prove the convergence of our proposed algorithms. Additionally we explore second order methods in the context of Markov Decision Processes. Through experimental evaluations, we validate our theoretical results. First, we consider a Stochastic Optimization (SO) problem where the aim is to optimize a given objective function in the presence of noise. Most of the solution techniques in SO estimate gradients from the noise corrupted observations of the objective and adjust parameters of the objective along the direction of the estimated gradients to obtain locally optimal solutions. Two prominent algorithms in SO namely Random Direction Kiefer-Wolfowitz (RDKW) and Simultaneous Perturbation Stochastic Approximation (SPSA) obtain noisy gradient estimates by randomly perturbing all the parameters simultaneously. This forces the search direction to be random in these algorithms and causes them to suffer additional noise on top of the noise incurred from the samples of the objective. Owing to this additional noise, the idea of using deterministic perturbations instead of random perturbations for gradient estimation has also been studied in the past. Specifically, two constructions of the deterministic perturbation sequence using lexicographical ordering and Hadamard matrices have been explored in the context of SPSA algorithms and encouraging results have been reported in the literature. In our work, we characterize the class of deterministic perturbation sequences that can be utilized in the RDKW algorithm...
Pagination:	x, 78
URI:	http://hdl.handle.net/10603/426451
Appears in Departments:	Computer Science and Automation

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	68.71 kB	Adobe PDF	View/Open
02_prelim pages.pdf		155.38 kB	Adobe PDF	View/Open
03_table of content.pdf		41.27 kB	Adobe PDF	View/Open
04_abstract.pdf		42.77 kB	Adobe PDF	View/Open
05_chapter 1.pdf		83.19 kB	Adobe PDF	View/Open
06_chapter 2.pdf		351.51 kB	Adobe PDF	View/Open
07_chapter 3.pdf		492.49 kB	Adobe PDF	View/Open
08_chapter 4.pdf		260.16 kB	Adobe PDF	View/Open
09_chapter 5.pdf		239.07 kB	Adobe PDF	View/Open
10_annexure.pdf		168.68 kB	Adobe PDF	View/Open
80_recommendation.pdf		119.1 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET