Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/429539
Title: Imitation Learning Techniques for Robot Manipulation
Researcher: Gubbi Venkatesh, Sagar
Guide(s): Amrutur, Bharadwaj
Keywords: Engineering
Engineering and Technology
Engineering Electrical and Electronic
University: Indian Institute of Science Bangalore
Completed Date: 2021
Abstract: Robots that can operate in unstructured environments and collaborate with humans play a major role in raising productivity and living standards as societies age. Unlike the robots currently used in industrial settings for repetitive tasks, they will have to be capable of perceiving the novel environments they come across, dealing with the ambiguities of natural and intuitive communication with non-expert human operators, and manipulate the objects in the environment in complex ways. This problem may be broadly divided into two areas. One is to specify what the task is to the robot, and the other is how to execute the specified task. In the first part of this thesis, a Siamese neural network with a modified spatial attention layer is proposed to specify novel objects that the robot has not seen during the training phase using visual cues. Although Siamese networks have been used for detecting novel objects, the prevalent architectures require a cropped image of the object and cannot support the use of natural and intuitive visual cues for specifying which is the object of interest in the scene. The proposed network is used to enable non-expert human operators to specify new objects by either using a laser pointer, or pointing with finger, or by video demonstration of the task by the human. The problem is a weakly supervised learning problem where the proposed architecture learns the visual cue implicitly as part of the training process without additional labels for the visual cue. In the second part of the thesis, instructions in natural language are interpreted in the context of the visual scene so that the robot can understand which object to manipulate. A U-Net structure along with LSTM for language processing is proposed for processing spatial relationships specified in the instruction in the context of the scene. Although the U-Net architecture has been successfully applied for several computer vision problems, we show that they are useful not only for object detection but also in the stages after object d...
URI: http://hdl.handle.net/10603/429539
Appears in Departments:Electrical Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File171.75 kBAdobe PDFView/Open
02_prelim pages.pdf567.63 kBAdobe PDFView/Open
03_contents.pdf77.47 kBAdobe PDFView/Open
04_abstract.pdf92.33 kBAdobe PDFView/Open
05_chapter 1.pdf5.65 MBAdobe PDFView/Open
06_chapter 2.pdf1.64 MBAdobe PDFView/Open
07_chapter 3.pdf2.94 MBAdobe PDFView/Open
08_chapter 4.pdf4.37 MBAdobe PDFView/Open
09_chapter 5.pdf390.85 kBAdobe PDFView/Open
10_chapter 6.pdf1.35 MBAdobe PDFView/Open
11_chapter 7.pdf1.13 MBAdobe PDFView/Open
12_chapter 8.pdf3.17 MBAdobe PDFView/Open
13_chapter 9.pdf959.89 kBAdobe PDFView/Open
14_chapter 10.pdf4.05 MBAdobe PDFView/Open
15_annexure.pdf209.68 kBAdobe PDFView/Open
80_recommendation.pdf265.96 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: