谷歌DeepMind—運用深度強化學習為雙足機器人學習敏捷足球技能 Movies

智能佳机器人發表於2024-12-05

原文連結:OP3 Soccer

Take a look at the OP3 Powered by DYNAMIXEL

看看由DYNAMIXEL 驅動的OP3

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner—well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website: OP3 Soccer

我們研究了深度強化學習(Deep RL)是否能為一種低成本的小型仿人機器人合成複雜且安全的運動技能,這些技能可在動態環境中組合成複雜的行為策略。我們使用深度強化學習訓練了一個擁有20個驅動關節的仿人機器人,讓其參與簡化的一對一(1v1)足球比賽。我們首先單獨訓練各項技能,然後在自我對戰的場景中將這些技能進行端到端的組合。由此產生的策略展現出穩健且動態的運動技能,如快速跌倒恢復、行走、轉身、踢球等,並且這些技能之間的轉換流暢、穩定、高效,遠遠超出了人們對該機器人的直觀預期。這些智慧體還形成了對遊戲的基本戰略理解,並學會了例如預判球的運動軌跡和阻擋對手射門等技能。一系列的行為僅透過一組簡單的獎勵就得以實現。我們的智慧體是在模擬環境中進行訓練的,並能實現向真實機器人的零樣本遷移。我們發現,儘管存在顯著的未建模效應和不同機器人例項間的差異,但足夠高頻的控制、有針對性的動力學隨機化以及模擬訓練中的擾動相結合,仍能實現高質量的遷移。儘管這些機器人本身很脆弱,但透過訓練過程中對硬體進行小幅修改以及對行為進行基本正則化,機器人能夠學習到安全有效的運動方式,同時保持動態和敏捷的表現。事實上,儘管這些智慧體的最佳化目標是得分,但在實驗中,它們行走速度比指令碼基準快156%,起身所需時間減少63%,踢球速度提高24%,同時還能高效組合各項技能以實現長期目標。有關這些新興行為和完整1v1比賽的影片可在補充網站OP3 Soccer上檢視。

Soccer players can tackle, get up, kick and chase a ball in one seamless motion. How could robots master these agile motor skills?

足球運動員能夠流暢地完成搶斷、起身、踢球和追球等一系列動作。機器人怎樣才能掌握這些敏捷的運動技能呢?

Movie S1: Training in simulation

影片S1:模擬訓練

We first trained individual skills in isolation, in simulation, and then composed those skills end-to-end in a self-play setting. We found that a combination of sufficiently high-frequency control and targeted dynamics randomization and perturbations during training in simulation enabled good-quality transfer to the robot.

我們首先單獨在模擬環境中訓練各項技能,然後在自我對戰的場景中將這些技能進行端到端的組合。我們發現,在模擬訓練中,足夠高頻的控制與有針對性的動力學隨機化和擾動相結合,能夠實現向機器人的高質量遷移。

Movie S2: 1v1 matches

影片S2:一對一比賽

5 one-versus-one matches. These matches are representative of the typical behavior and gameplay of the fully trained soccer agent.

5場一對一比賽。這些比賽充分展示了經過全面訓練的足球智慧體的典型行為和遊戲玩法。

Movie S3: Set pieces in simulation and in the real environment

影片S3:模擬環境和真實環境中的定位球

We analysed the agent's performance in two set-pieces, to gauge the reliability of getting up and shooting behaviors and to measure the performance gap between the simulation and the real environment. We also compared behaviors with scripted baseline skills. In experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline.

我們分析了智慧體在兩種定位球情況下的表現,以評估起身和射門行為的可靠性,並衡量模擬環境與真實環境之間的效能差距。我們還將智慧體的行為與指令碼基準技能進行了比較。在實驗中,智慧體的行走速度比指令碼基準快156%,起身所需時間減少63%,踢球速度提高24%。

Movie S4: Robustness and recovery from pushes

影片S4:魯棒性和受推後的恢復

Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training lead to safe and effective movements while still being able to perform in a dynamic and agile way.

儘管這些機器人本身很脆弱,但透過訓練過程中對硬體進行小幅修改以及對行為進行基本正則化,機器人能夠學習到安全有效的運動方式,同時仍能保持動態和敏捷的表現。

Preliminary Results: Learning from vision

初步結果:從視覺中學習

We conducted a preliminary investigation of whether deep RL agents can learn directly from raw egocentric vision. In this context the agent must learn to control its camera and integrate information over a window of egocentric viewpoints to predict various game aspects. Our initial analysis indicates that deep RL is a promising approach to this challenging problem. We conducted a simpler set-piece using fixed walker and ball positions and found our agent scored 10 goals in simulation and 6 goals on the real robot over 10 trials.

我們進行了初步調查,研究深度強化學習(Deep RL)智慧體是否能直接從以自我為中心的原始視覺中學習。在這種情況下,智慧體必須學會控制其攝像頭,並在以自我為中心的觀點視窗內整合資訊,以預測遊戲的不同方面。我們的初步分析表明,深度強化學習是解決這一具有挑戰性問題的一個有前途的方法。我們進行了一個更簡單的定位球實驗,使用固定的行走者和球的位置,發現我們的智慧體在模擬環境中10次試驗中進了10個球,在真實機器人上10次試驗中進了6個球。

We hope the challenge of integrating the get-up skill and learning vision-guided exploration and multi-agent strategies will be tackled by future work.

我們希望未來的工作能夠解決整合起身技能、學習視覺引導的探索以及多智慧體策略的挑戰。

Movie S5: Preliminary vision based agents

影片S5:基於視覺的初步智慧體

產品名稱

京東店鋪

ROBOTIS OP3 達爾文機器人 DARWIN OP3 達爾文三代 動態人形智慧機器人

高階機器人產品,適合各種比賽,原裝進口

https://item.jd.com/16196038169.html

您對此產品感興趣,請聯絡我們!

智慧佳機器人

400 099 1872

www.bjrobot.com

京東店鋪:智慧佳機器人專營店 - 京東 (jd.com)

淘寶店鋪:首頁-智慧佳機器人-淘寶網 (taobao.com)

企業淘寶:首頁-智慧佳機器人官方店鋪-淘寶網 (taobao.com)

相關文章