"The Alberta Plan for AI Research" - "Research Vision" from Richard Sutton

WrRan發表於2024-10-12

原文網址 : https://www.cnblogs.com/WrRan/p/18459694

The Alberta Plan characterizes the problem of AI as the online maximization of reward via continual sensing and acting, with limited computation, and potentially in the presence of other agents.

origin

We seek to understand and create long-lived computational agents that interact with a vastly more complex world and come to predict and control their sensory input signals, particularly a distinguished scalar signal called reward. The overall setting we consider is familiar from the field of reinforcement learning. An agent and an environment exchange signals on a fine time scale. The agent sends actions to the environment and receives sensory signals back from it. The larger sensory signal, the observation, is explicitly not expected to provide complete information about the state of the environment. The second sensory signal, the reward, is a scalar and defines the ultimate goal of the agent -- to maximize the total reward summed over time. These three time series -- observation, action, and reward -- constitute the experience of the agent. We expect all learning to be grounded in these three signals and not in variables internal to the environment. Only experience is available to the agent, and the environment is known only as a source and sink for these signals.

The first distinguishing feature of the Alberta Plan's research vision is its emphasis on ordinary experience, as described above, as opposed to special training sets, human assistance, or access to the internal structure of the world. Although there are many ways human input and domain knowledge can be used to improve the performance of an AI, such methods typically do not scale with computational resources and as such are not a research priority for us.

The second distinguish feature of the Alberta Plan's research vision can be summarized in the phrase temporal uniformity. Temporal uniformity means that all times are the same with respect to the algorithms running on the agent. There are no special training periods when training information is available or when rewards are counted more or less than others. If training information is provided, as it is via the reward signal, then it is provided on every time step. If the agent learns or plans, then it learns or plans on every time step. If the agent constructs its own representations or subtasks, then the meta-algorithms for constructing them operate on every time step. If the agent can reduce its speed of learning about parts of the environment when they appear stable, then it can also increase its speed of learning when they start to change. Our focus on temporally uniform problems and algorithms leads us to interest in non-stationary, continuing environments and in algorithms for continual learning and meta-learning.

Temporal uniformity is partly a constraint on what we research and partly a discipline that we impose on ourselves. Keeping everything temporally uniform reduces degrees of freedom and shrinks the agent-design space. Why not keep everything temporally uniform? Having posed that rhetorical question, we acknowledge that there may be situations in which it is preferable to depart from absolute temporal uniformity. But when we do so, we are aware that we are stepping outside this discipline.

The third distinguishing feature of the Alberta Plan research vision is its cognizance of computational considerations. Moore's law and its generalizations bring steady exponential increases in computer power, and we must prioritize methods that scale proportionally to that computer power. Computer power, though exponentially more plentiful, is never infinite. The more we have, the more important it is to use it efficiently, because it is a greater and greater determinant of our agents' performance. We must heed the bitter lesson of AI's past and prioritize methods, such as learning and search, that scale extensively with computer power, while de-emphasizing methods that do not, such as human insight into the problem domain and human-labeled training sets.

Beyond these large-scale implications, computational considerations enter into every aspect of an intelligent agent's design. For example, it is generally important for an intelligent agent to be able to react quickly to a change in its observation. But, given the computational limitations there is always a tradeoff between reaction time and the quality of the decision. The time steps should be of uniform length. If we want the agent to respond quickly, then the time step must be small -- smaller than would be needed to identify the best action. A better action might be available from planning, but planning, and even learning, takes time; sometimes it is better to act fast than to act well.

Giving priority to reactive action in this way does not preclude an important role for planning. The reactive policy may recommend a temporizing action util planning has improved the policy before a more committal action is taken, just as a chess player may wait until she is sure of her move before making it. Planning is an essential part of intelligence and our or research vision.

The fourth distinguishing feature of the Alberta Plan research vision is that it includes a focus on the special case in which the environment includes other intelligent agents. In this case the primary agent may learn to communicate, cooperate, and compete with the environment and should be cognizant that the environment may behave differently in response to its action. AI research into game playing must often deal with these issues. The case of two or more cooperating agents also includes cognitive assistants and prostheses. This case is studied as Intelligence Amplification (IA), a subfield of human-machine interaction. There are general principles by which one agent may use what it learns to amplify and enhance the action, perception, and cognition of another agent, and this amplification is an important part of attaining the full potential of AI.

The Alberta Plan characterizes the problem of AI as the online maximization of reward via continual sensing and acting, with limited computation, and potentially in the presence of other agents. This characterization might seem natural, even obvious, but it is also contrary to current practice, which is often focused on offline learning, prepared training sets, human assistance, and unlimited computation. The Alberta Plan research vision is both classical and contrarian, and radical in the sense of going to the root.

"The Alberta Plan for AI Research" - "Research Plan" from Richard Sutton
2024-10-12
AI
"The Alberta Plan for AI Research" - "Intro" from Richard Sutton
2024-10-12
AI
research gap
2020-10-24
ENGL642 Research Skills Question
2024-07-01
流式深度學習終於奏效了！強化學習之父Richard Sutton力薦
2024-11-29
深度學習強化學習
Hanover Research：2022年校友報告
2023-04-01
ABI Research：預計2024年全球雲端 AI 晶片達100 億美元
2019-09-16
AI晶片
Edison Research：2020年數字報告
2020-04-14
Edison Research：2023年數字報告
2023-03-31
ByteDance Research登Nature子刊：AI+冷凍電鏡，揭示蛋白質動態
2024-11-12
AI
Allianz Research：2022年全球破產報告
2023-02-26
Community Research：2022年智慧音響報告
2023-04-07
Unity
強化學習之父Richard Sutton給出一個簡單思路，大幅增強所有RL演算法
2024-11-01
強化學習演算法
DZone Research：資料庫附加註意事項
2018-11-01
資料庫
Edison Research：2021年口語音訊報告
2021-12-23
音訊
一文概覽2017年Facebook AI Research的計算機視覺研究進展
2018-03-19
AI計算機視覺
Synergy Research ：2019年CPaaS收入增長超過40%
2020-01-04
Synergy Research：2019年資料中心支出達930億美元
2020-01-09
Osterman Research：遵守CCPA和其他消費者隱私法規
2020-03-21
Synergy Research：2018年託管市場增長了10%
2019-05-08
ABI Research：深度解讀工業革命11大領域
2019-06-03
Grayscale Research：Web3.0虛擬雲經濟報告
2022-01-07
Web
Hanover Research：2022年市場研究現狀報告
2023-01-31
Edison Research：2022年媽媽們與媒體報告
2022-07-08
DIGITIMES Research：2019年全球十大晶片設計公司排名
2019-03-29
Git晶片
SCMP Research：2020年中國網際網路報告
2020-07-22
解讀Nucleus Research 2022資料倉儲價值矩陣
2022-02-18
矩陣
Hanover Research：2021年美國準學生入學情況
2023-01-08
CINNO Research：2021年9月iPhone銷量同比增超30%
2021-10-31
iPhone
Digitimes Research：2018下半年新iPhone將售出8500萬部
2018-09-16
GitiPhone
Hugging Face 與 Wiz Research 合作提高人工智慧安全性
2024-05-14
Hugging Face人工智慧
Guide2Research：2020全球頂尖電腦科學家排名
2020-05-25
GUIIDE
AMPD Research：澳大利亞SVOD付費使用者達1020萬人
2019-07-14
Counterpoint Research：2021Q4中國智慧機市場報告
2022-02-18
Hanover Research：2022年美國未來學生調查報告
2023-03-04
Synergy Research：2020年Q3雲服務支出增長28%
2020-12-19
Efficient DevSecOps Workflows with a Little Help from AI
2024-09-15
devAI
軟體複用導致的軟體依賴問題 - research!rsc
2019-01-24

"The Alberta Plan for AI Research" - "Research Vision" from Richard Sutton

相關文章