ARS Reinforcement Learning using Gymnasium

人生苦短6發表於2024-11-20

原文網址 : https://www.cnblogs.com/comp9021T2/p/18558450

ARS - Coursework Guide – 24/25

Version History 1.0

29/09/24First version.1.1

Fleshed out marking criteria for task 2 reportSummary Title:Reinforcement Learning using Gymnasium environments

Hand-in:Programs AND a written report will need to be submitted online via Moodle. Checkthe module’s Moodle page for the precise deadline.

Late policy:The coursework deadlines (task 1 and task 2) are absolute. Late submissions aresubject to a 5% deduction of the overall coursework mark per day.

Informal Description The coursework consists of two tasks as described below. Your aim is to build several reinforcementlearning agents and to design, implement and un several basic research-based experiments. Youwill hand-in software and a report that discusses your work on these tasks. Briefly, task 1 is aboutimplementing some asic RL prototypes (with noise injection and basic modularity) for your chosenenvironment(s) and identification of key literature, gaps, and research questions, whereas task 2 iabout designing, developing and running experiments based on the research questions identified intask 1.

ims and Outcomes

If you take the labs seriously, at the end of the semester you should be:o comfortable with implementing and modifying reinforcement learning agents,o capable of adapting your RL solutions to different kinds of robotic problems withwelldefined states, actions and rewardso comfortable with neural network approaches for the mapping of complex highdimensional states to actions (if you choose to use neural network based Rsolutions),o comfortable with setting up experiments pertaining to noise and studying andmitigating its impact, comfortable with designing modular AI solutions,o capable of scanning the literature in order to understand modernRL techniques, andincorporating/extending these in your own solutions,o capable of identifying gaps, and/or weaknesses/limitations in state-of-theartresearch, and using this to define research 代寫ARS Reinforcement Learning using Gymnasium questions for guiding your research,o capable of studying and evaluating algorithm performance objectively,o capable of designing innovative algorithms and experiments, and reporting theresults of these in a clear and well-structured manner.Rough Timetable Laboratory notesYou will work individually.

We need to start working hard from the very first day to make the most of the lab sessions.In the first week you will learn the basics of Gymnasium, will experiment with severalenvironments, and will even try some small heuristics on simple control problems(e.g.cartpole).Rough time estimation:o Total hours: 20 credits ≈ 200 hourso Subtract lectures (22 hours) and labs (20 hours) = 200 – 42 = 158 ivide the remainder by 12 weeks = 158 / 12 ≈ 13 hours per week for everythingelse, e.g.: studying, researching, reading, thinking, coding, testing, analyzing, writing.Getting Started Preliminary steps

Check the following three main Gymnasium resources:o Farama’s general documentation page for Gymnasium.o Basic usage page in the above documentation.o Gymnasium GitHub page – includes installation instructions.Install Gymnasium.For the purpose of the coursework it is sufficient to work with the “classic control” set ofenvironments, however do feel free to install and use other categories of environments (e.g.

MuJoCo and Atari), if you wish.Go through the Basic Usage page.

You can install Gym on your own machines, or in your local directory in UNM’s HPC, or youcan also use Google Colaboratory. Please note that in the past there were ways to renderenvironments properly in Colab (e.g. have a look at this tutorial) however this may changefrom time to time. For an example of a Jupyter notebook for the cartpole example, refer to the module’s Moodle page. I suggest not bothering with rendering, except for someexercises, since performance metrics are the key concern.mentioned, if you want to use any of the MuJoCo environments you can. Deep Mind

recently bought MuJoCo and made it open source, which means there are no more licensingissues. You are not required to use MuJoCo, but if you really want to, you are t,and get the environments setup.To see what environments are available use:mport gymnasium as gymprint(gym.envs.registry.keys())To better understand someGymnasium environments consult this Wiki or scroll to“environments” in the Gymnasium’s GitHub page, and search for your environment. Forexample for the cart pole environment have a look at this page.

ry to come up with some heuristic solutions for Cart Pole

Try to come up with some simple heuristics to keep the pole up based on yourunderstanding of the environment. You can start from and modify the (failing) heuristicexample provided in the Moodle page (i.e. sol-H1-cart-pole-v0).

Difficult? Let's see whether reinforcement learning helps.Have a look at a Q-learning solutioExample: s1cart-pole-v0-sol1.Try to run the code.

Read the code. Try to understand it as much as possible, although note, it will only fullymake sense once we have done Q-Learning in the lectures.Task Description

Requirements for Task 1:

o Title. Prototypes, literature, gaps, and research questions.o Prototypes: ▪ Environment selection. Select two environments to work on throughoutthe whole assignment. Select one environment from within the controlcategory (e.g. CartPolev1) and one environment from any category(including the controlone).Please recall that different environmentsmay impose significant changes to your reinforcement learning

since, for example, they may involve continual action spaces,

or other representational differences. To simplify matters you might

want to constrain yourself to environments with discrete action spaces.

▪ Core method required: reinforcement learning. If you want to use other

methods for other integrated modules, that is fine.▪ Additional requirements: (1) noise injection at the inputs and/or

outputs, (2) some modularity (e.g. RL component and denoisingcomponent).▪ Aim: for each environment develop at least one viable proof of conceptbased on RL.o Literature: ▪ Steps:Explore the recent RL literature in relation to the topic of noisepaperswillbeyour “core/seed” papers, you should still study the literaturemore broadly(i.e. your report should citeotherpapers apartom the core papers).Select your gaps for further investigation. Justify your choices.

Design at least 2 research questions based on your selectedgaps.▪ Aim: clearly outline 1-3 selected papers, overall gaps, selected gaps, andresearch questions. Note that it is crucial for the papers, gaps andresearch questions to be 100% credible, i.e.: (1) the papers must berecent and good, (2) the gaps must be genuine open problems, and (3)the research questions must sit squarely in the gaps andmustpoint inuseful directions.▪ Constraint 1: Every student must have a different set of core papersand/or a different set of gaps and/or a different set of researchquestions (RQs). Once a student has defined their selected papers, gaps,nd RQs, they must email them to me, in order for me to check andpprove them. Please note that this processwill operate on a “firstcome first served” basis. Please also note that if two students share thesame papers, they can stillbedifferent in terms of the chosen gaps orRQs, however, it is preferable if all elements are distinct.▪ Constraint 2: The selected research questions must include, or focus on,

(Requirements for Task 2:o Title. Research questions and experiments.o Environment selection. You must use the same two environment you selected

for task 1.

o Core method required: reinforcement learning. As before, if you want to use

other methods for other integrated modules, that is fine.

o Goals. Keywords: novel experiments and insights. The aim of this task is for yoo design, develop, run, and analyze, experiments that address the researchquestions your listed in task 1. The mains tasks would be: (1) design experimentsassessanswered the research questions, (6) eitherproceed backto step 1 with adjustments to the experiments/solutions, orproceed with additional experiments (depending on ime and completionstatus). Document your findings.Requirements for all tasks (i.e. tasks 1 and 2): o Performance. Define one or more valid performance measures, apart from thedefault/compulsory one, i.e.: the average number of episodes needed before

a problem (see below for more information).

o Evaluation. Run your experiments and report your results for both of your

chosen environments consistently.

o Four I’s. Try to maximize your work along the following dimensions: (1)informedness (i.e. it is based on a solid understanding of the literature), (2)innovativeness (i.e.novel), (3) inventiveness (i.e. not technically trivial), (4)impactfulness (e.g. generates new knowledge).o Core themes. The core themes for both tasks are: (1) reinforcement learning, (2)noise, (3) modularity. Please note that the research questions can be exclusivelyaboutnoise, or modularity, or both, however, the models must always includeelements of noise and modularity.

Demo. Show and explain the performance of your solutions, and the results of yourexperiments.Performance Evaluation

you will be injecting noise into your sensor data and/or actions, your results are directly comparable to solutions on external leaderboards (e.g.:https://github.com/openai/gym/wiki/Leaderboard). Your focus will be on internalcomparisons (i.e. your own experimental conditions) and innovation.One key performance measure that you should recall is the number of episodes requiredbefore solving the problem. In other words, here you are interested in the speedoflearning. Care must be taken in being explicit and consistent regarding what constituteshaving solved theproblem.Assessment – OverallComponentMarks (100) Description

Main Criteria Task 1 - demo5mo of work sopages)summarizing task1Are the core papers (1-3) well explained? Are the overall gapswell identified and explained? Are the selected gapsjustifiedproperly? Are the research questions grounded in the gaps,and are they clear, concrete, and heading in the rightdirection?Task 2 - demo5Demo of work sofar.Evidence ofunderstanding of the base code. Good explanationof gaps, question, experimental design, results, analyses, andconclusions. Solid argumentation vis-à-vis the 4 I’s. Strongjustifications and arguments. Clear communication.Task 2 - paper50Mini-conferencepaper (4 pages)summarizing all ofthe work done onboth tasks.Are the structure, grammar and argumentation of thepaper/report good? Are the introduction,background,methods, results and analyses, clear, comprehensive andinsightful? Does the paper show critical and creativethinking?Task 2 - software20Multiple filesorganized with aclear structure.Is the code complete? Is the code well-designed, clean,elegant, and well commented? Is the codecomplex/challengingenough?Assessment Criteria for theReport (task 1) and Paper (task 2) 1st an excellent, well-written report/paper demonstrating extensive understanding andgood insight.2:1 a comprehensive, well-written report/paper demonstrating thorough understanding andsome insight2:2 a competent report/paper demonstrating good understanding of the implementation.

3rd an adequate report/paper covering all specified topics at a basic level of understanding.

F an inadequate report/paper failing to cover the specified topics.Report guide (task 1)

The report for task 1 has no fixed format, as long as it is well structured and well organized.The only constraint is that it should be 1-2 pages long. No appendices areallowed, and to befair to all, no material on page 3 onwards (if you exceed 2 pages) will be included in theassessment. The font size of the main text should not be smaller than 11.

This report will exclusively focus on: (1) a very brief summary of your prototypes, (2) briefsummaries of your selected core papers, and why they were chosen, (3) lengthier explanations on the weaknesses/gaps of the papers, (4) an explanation and justification ofyour selected gaps, and (5) an explanation and justification ofyourresearch questions, andhow they are grounded in the gaps.

Paper Guide (task 2) You should design your final report as a conference paper. The paper should contain:

[8 marks] Introduction (about 1 page). Brief explanation of the motivation and mainconcepts, a problem statement, an extremely brief overview of the key papersand theirgaps, the research questions, and a brief summary of your main contributions. Key marking : (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation,(5) Insightfulness, (6) Critical and creative thinking[8 marks] Background (about 0.5 pages). Brief overview of the field and the key papersclosely related to your work (this will include the core 1-3 papers and other relevant papers). core selected papers with their gaps, and why there were chosen selected, must beclearly explained. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.

[8 marks] Methods (about 1 page). A detailed and concise description of how yomplemented task 2 (e.g. algorithms and experimental design). Key marking criteria: (1)Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation.

[10 marks] Results (about 1 page). An overview of your key results encompassinperformance measures and other results leading to insights about the problem and/or yoursolutions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness.

[10 marks] Discussion (about 0.5 pages). Your interpretation of the results, your conclusions,and proposed future work. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.

[6 marks] References & Appendices (not included in the word count). Key marking criteria:(1) Consistency of references, (2) Comprehensiveness of references, (3) Structure and clarity appendices, (4) Insightfulness of appendices.Note: Writing a concise report/paper is a core part of the assignment. The total number of pages for paper (i.e. main sections, excluding references and Appendices) cannot exceed 4 pages (with aminimum page margin of 2.5cm on each side), using single line spacing, a two-column format, and aminimum font size of 11

Reinforcement Learning Basic Notes
2024-04-28
Reinforcement Learning Chapter2
2024-02-05
APT
Enhancing Diffusion Models with Reinforcement Learning
2024-07-24
Papers of Multi Agent Reinforcement Learning(MARL)
2018-08-06
Join Query Optimization with Deep Reinforcement Learning Algorithms
2020-12-27
Go
Jan 2023-Prioritizing Samples in Reinforcement Learning with Reducible Loss
2023-05-17
吳恩達機器學習第三課 Unsupervised learning recommenders reinforcement learning
2024-06-10
吳恩達機器學習
文章學習29“Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning”
2018-10-08
RaftAIREST
【Coursera GenAI with LLM】 Week 3 Reinforcement Learning from Human Feedback Class Notes
2024-03-15
AI
論文閱讀翻譯之Deep reinforcement learning from human preferences
2024-09-11
翻譯 | Learning React Without Using React Part 2
2019-02-24
React
翻譯 | Learning React Without Using React Part 1
2018-12-20
React
強化學習(Reinforcement Learning)中的Q-Learning、DQN，面試看這篇就夠了！
2019-08-18
強化學習面試
mujoco gymnasium 環境
2024-08-11
ρars/ey 題解
2024-08-08
論文閱讀 dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning
2022-06-11
APT
P8564 ρars/ey 題解
2024-09-26
《Retrieval of oceanic chlorophyll concentration from GOES-R Advanced Baseline Imager using deep learning》論文筆記
2024-10-24
Go筆記
MySQL 索引優化 Using where, Using filesort
2020-10-26
MySql索引優化
MySQL explain結果Extra中"Using Index"與"Using where; Using index"區別
2022-03-01
MySqlAIIndex
Ars Technica：PS 5 數字版庫存量只佔24%
2020-09-22
場景採集感知測評軟體 INTEWORK-ARS
2022-05-24
Using hints for Postgresql
2020-02-03
SQL
String interpolation using $
2024-10-18
using的用法
2021-11-15
Using mysqldump for backups
2020-12-02
MySql
MySQL 之 USING
2021-04-08
MySql
多模態學習之論文閱讀：《PREDICTING AXILLARY LYMPH NODE METASTASIS IN EARLY BREAST CANCER USING DEEP LEARNING ON PRIMARY TUMOR BIOPSY SLIDES》
2024-08-09
ASTIDE
Zero-shot Learning零樣本學習論文閱讀（四）——Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths
2020-12-31
APP
pdf crop using python
2024-03-18
Python
MGTSC 212 using Excel
2024-12-05
Excel
Video Division with using OpenCv
2018-03-29
IDEOpenCV
Dictionary application using Swing
2024-12-01
APP
What are the benefits of using an proxy?
2021-09-11
[Active Learning] Multi-Criteria-based Active Learning
2019-04-22
淺談Using filesort和Using temporary 為什麼這麼慢
2022-03-01
learning sequelize
2019-02-16
Meta Learning
2018-10-05

ARS Reinforcement Learning using Gymnasium

相關文章