Algorithms for Challenges to Practical Reinforcement Learning

Sindhu, P R

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/426487

Title:	Algorithms for Challenges to Practical Reinforcement Learning
Researcher:	Sindhu, P R
Guide(s):	Bhatnagar, Shalabh
Keywords:	Automation and Control Systems Computer Science Engineering and Technology
University:	Indian Institute of Science Bangalore
Completed Date:	2020
Abstract:	Reinforcement learning (RL) in real world applications faces major hurdles - the foremost being safety of the physical system controlled by the learning agent and the varying environment conditions in which the autonomous agent functions. A RL agent learns to control a system by exploring available actions. In some operating states, when the RL agent exercises an exploratory action, the system may enter unsafe operation, which can lead to safety hazards both for the system as well as for humans supervising the system. RL algorithms thus need to respect these safety constraints and must do so with limited available information. Additionally, RL autonomous agents learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL algorithms yield sub-optimal decisions. In this thesis, the first part develops algorithmic solutions to the challenges of safety and non-stationary environmental conditions. In order to handle safety restrictions and facilitate safe exploration during learning, this thesis proposes a cross-entropy method based sample efficient learning algorithm. This algorithm is developed on constrained optimization framework and utilizes very limited information for the learning of feasible policies. Also during the learning iterations, the exploration is guided in a manner that minimizes safety violations. In the first part, another algorithm for the second challenge is also described. The goal of this algorithm is to maximize the long-term discounted reward accrued when the latent model of the environment changes with time. To achieve this, the algorithm leverages a change point detection algorithm to find change in the statistics of the environment. The results from this statistical algorithm are used to reset learning of policies...
Pagination:	xv, 125
URI:	http://hdl.handle.net/10603/426487
Appears in Departments:	Computer Science and Automation

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	168.53 kB	Adobe PDF	View/Open
02_prelim pages.pdf		211.51 kB	Adobe PDF	View/Open
03_table of content.pdf		54.92 kB	Adobe PDF	View/Open
04_abstract.pdf		73.33 kB	Adobe PDF	View/Open
05_chapter 1.pdf		139.44 kB	Adobe PDF	View/Open
06_chapter 2.pdf		737.54 kB	Adobe PDF	View/Open
07_chapter 3.pdf		522.86 kB	Adobe PDF	View/Open
08_chapter 4.pdf		483.26 kB	Adobe PDF	View/Open
09_chapter 5.pdf		386.76 kB	Adobe PDF	View/Open
10_chapter 6.pdf		3.11 MB	Adobe PDF	View/Open
11_annexure.pdf		228.26 kB	Adobe PDF	View/Open
80_recommendation.pdf		259.09 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET