Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/429539
Title: | Imitation Learning Techniques for Robot Manipulation |
Researcher: | Gubbi Venkatesh, Sagar |
Guide(s): | Amrutur, Bharadwaj |
Keywords: | Engineering Engineering and Technology Engineering Electrical and Electronic |
University: | Indian Institute of Science Bangalore |
Completed Date: | 2021 |
Abstract: | Robots that can operate in unstructured environments and collaborate with humans play a major role in raising productivity and living standards as societies age. Unlike the robots currently used in industrial settings for repetitive tasks, they will have to be capable of perceiving the novel environments they come across, dealing with the ambiguities of natural and intuitive communication with non-expert human operators, and manipulate the objects in the environment in complex ways. This problem may be broadly divided into two areas. One is to specify what the task is to the robot, and the other is how to execute the specified task. In the first part of this thesis, a Siamese neural network with a modified spatial attention layer is proposed to specify novel objects that the robot has not seen during the training phase using visual cues. Although Siamese networks have been used for detecting novel objects, the prevalent architectures require a cropped image of the object and cannot support the use of natural and intuitive visual cues for specifying which is the object of interest in the scene. The proposed network is used to enable non-expert human operators to specify new objects by either using a laser pointer, or pointing with finger, or by video demonstration of the task by the human. The problem is a weakly supervised learning problem where the proposed architecture learns the visual cue implicitly as part of the training process without additional labels for the visual cue. In the second part of the thesis, instructions in natural language are interpreted in the context of the visual scene so that the robot can understand which object to manipulate. A U-Net structure along with LSTM for language processing is proposed for processing spatial relationships specified in the instruction in the context of the scene. Although the U-Net architecture has been successfully applied for several computer vision problems, we show that they are useful not only for object detection but also in the stages after object d... |
URI: | http://hdl.handle.net/10603/429539 |
Appears in Departments: | Electrical Communication Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 171.75 kB | Adobe PDF | View/Open |
02_prelim pages.pdf | 567.63 kB | Adobe PDF | View/Open | |
03_contents.pdf | 77.47 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 92.33 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 5.65 MB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 1.64 MB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 2.94 MB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 4.37 MB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 390.85 kB | Adobe PDF | View/Open | |
10_chapter 6.pdf | 1.35 MB | Adobe PDF | View/Open | |
11_chapter 7.pdf | 1.13 MB | Adobe PDF | View/Open | |
12_chapter 8.pdf | 3.17 MB | Adobe PDF | View/Open | |
13_chapter 9.pdf | 959.89 kB | Adobe PDF | View/Open | |
14_chapter 10.pdf | 4.05 MB | Adobe PDF | View/Open | |
15_annexure.pdf | 209.68 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 265.96 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: