Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/426451
Title: | Algorithms for Stochastic Optimization Statistical Estimation and Markov Decision Processes |
Researcher: | Kamanchi, Chandramouli |
Guide(s): | Bhatnagar, Shalabh |
Keywords: | Automation and Control Systems Computer Science Engineering and Technology |
University: | Indian Institute of Science Bangalore |
Completed Date: | 2020 |
Abstract: | Stochastic approximation deals with the problem of finding zeros of a function expressed as an expectation of a random variable. In this thesis we propose convergent algorithms for problems in optimization, statistical estimation and reinforcement learning. We first formulate these problems in the stochastic approximation setting. Subsequently we utilize the ordinary differential equations based analysis of stochastic approximation algorithms to prove the convergence of our proposed algorithms. Additionally we explore second order methods in the context of Markov Decision Processes. Through experimental evaluations, we validate our theoretical results. First, we consider a Stochastic Optimization (SO) problem where the aim is to optimize a given objective function in the presence of noise. Most of the solution techniques in SO estimate gradients from the noise corrupted observations of the objective and adjust parameters of the objective along the direction of the estimated gradients to obtain locally optimal solutions. Two prominent algorithms in SO namely Random Direction Kiefer-Wolfowitz (RDKW) and Simultaneous Perturbation Stochastic Approximation (SPSA) obtain noisy gradient estimates by randomly perturbing all the parameters simultaneously. This forces the search direction to be random in these algorithms and causes them to suffer additional noise on top of the noise incurred from the samples of the objective. Owing to this additional noise, the idea of using deterministic perturbations instead of random perturbations for gradient estimation has also been studied in the past. Specifically, two constructions of the deterministic perturbation sequence using lexicographical ordering and Hadamard matrices have been explored in the context of SPSA algorithms and encouraging results have been reported in the literature. In our work, we characterize the class of deterministic perturbation sequences that can be utilized in the RDKW algorithm... |
Pagination: | x, 78 |
URI: | http://hdl.handle.net/10603/426451 |
Appears in Departments: | Computer Science and Automation |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 68.71 kB | Adobe PDF | View/Open |
02_prelim pages.pdf | 155.38 kB | Adobe PDF | View/Open | |
03_table of content.pdf | 41.27 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 42.77 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 83.19 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 351.51 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 492.49 kB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 260.16 kB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 239.07 kB | Adobe PDF | View/Open | |
10_annexure.pdf | 168.68 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 119.1 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: