Towards Hardware Accelerated Reinforcement Learning For Application-Specific Robotic Control

Shengjia Shao, Jason Tsai, Michal Mysior,Wayne Luk,Thomas C. P. Chau,Alexander Warren,Ben Jeppesen

2018 IEEE 29TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP)（2018）

引用 23|浏览26

暂无评分

摘要

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its long-term cumulative reward. This paper presents a novel approach which has shown promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.

查看译文

关键词

trained RL policy,robot arm,trained arm,application-specific robotic control,machine learning,sequential decisions,optimal decision-making policy,longterm cumulative reward,accelerated simulation,RL policy training,design space exploration techniques,FPGA accelerator,Trust Region Policy Optimisation,deep learning libraries,speed improvement,FPGA implementation,hardware accelerated reinforcement Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要