Imitation Learning Techniques for Robot Manipulation

Gubbi Venkatesh, Sagar

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/429539

Title:	Imitation Learning Techniques for Robot Manipulation
Researcher:	Gubbi Venkatesh, Sagar
Guide(s):	Amrutur, Bharadwaj
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	Indian Institute of Science Bangalore
Completed Date:	2021
Abstract:	Robots that can operate in unstructured environments and collaborate with humans play a major role in raising productivity and living standards as societies age. Unlike the robots currently used in industrial settings for repetitive tasks, they will have to be capable of perceiving the novel environments they come across, dealing with the ambiguities of natural and intuitive communication with non-expert human operators, and manipulate the objects in the environment in complex ways. This problem may be broadly divided into two areas. One is to specify what the task is to the robot, and the other is how to execute the specified task. In the first part of this thesis, a Siamese neural network with a modified spatial attention layer is proposed to specify novel objects that the robot has not seen during the training phase using visual cues. Although Siamese networks have been used for detecting novel objects, the prevalent architectures require a cropped image of the object and cannot support the use of natural and intuitive visual cues for specifying which is the object of interest in the scene. The proposed network is used to enable non-expert human operators to specify new objects by either using a laser pointer, or pointing with finger, or by video demonstration of the task by the human. The problem is a weakly supervised learning problem where the proposed architecture learns the visual cue implicitly as part of the training process without additional labels for the visual cue. In the second part of the thesis, instructions in natural language are interpreted in the context of the visual scene so that the robot can understand which object to manipulate. A U-Net structure along with LSTM for language processing is proposed for processing spatial relationships specified in the instruction in the context of the scene. Although the U-Net architecture has been successfully applied for several computer vision problems, we show that they are useful not only for object detection but also in the stages after object d...
URI:	http://hdl.handle.net/10603/429539
Appears in Departments:	Electrical Communication Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	171.75 kB	Adobe PDF	View/Open
02_prelim pages.pdf		567.63 kB	Adobe PDF	View/Open
03_contents.pdf		77.47 kB	Adobe PDF	View/Open
04_abstract.pdf		92.33 kB	Adobe PDF	View/Open
05_chapter 1.pdf		5.65 MB	Adobe PDF	View/Open
06_chapter 2.pdf		1.64 MB	Adobe PDF	View/Open
07_chapter 3.pdf		2.94 MB	Adobe PDF	View/Open
08_chapter 4.pdf		4.37 MB	Adobe PDF	View/Open
09_chapter 5.pdf		390.85 kB	Adobe PDF	View/Open
10_chapter 6.pdf		1.35 MB	Adobe PDF	View/Open
11_chapter 7.pdf		1.13 MB	Adobe PDF	View/Open
12_chapter 8.pdf		3.17 MB	Adobe PDF	View/Open
13_chapter 9.pdf		959.89 kB	Adobe PDF	View/Open
14_chapter 10.pdf		4.05 MB	Adobe PDF	View/Open
15_annexure.pdf		209.68 kB	Adobe PDF	View/Open
80_recommendation.pdf		265.96 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET