bandit-switch cost
Recently I am studying bandit problem with switching cost. Recall a bandit problem is to play an online game against an adversary. On every round t, you can choose an arm to play from K arms and then suffer a loss coming from adversary.
For stochastic bandit, every loss/reward coming from same arm is from same distribution. And to estimate the regret, we need to estimate means from sample rewards and also number of arms being pulled. For adversarial bandit, it doesn't have such a condition. Every round, the adversary will pick one loss vector and player choose one arm and suffers corresponding loss. To estimate regret, usually one maintains a probability distribution over all arms, chooses arms according to the probability distribution and then updates probability distribution according to the received loss.
There is a variant of bandit problem that player will suffer an additional switch cost if the chosen arm at this round is different from last round. This condition will restrict player's choice to let her much more prefers staying at the same arm. But this topic will be hard if the adversary is non-oblivious, that is, it will remember player's past choices. If player fixes one arm in the past, the adversary will probably give more loss on this arm in the next round. Even though it will result in a very bad grade to stick on one arm, it can still be regarded as a baseline, that is, we can compare performance of designed strategies against constant arm.
For adversarial bandit problem with switch cost against an oblivious adversary, the upper bound of policy regret which compares to best constant strategy is O(T^{2/3}). And last year Dekel proved minimax regret has same lower bound. So adversarial bandit with switch cost has lower regret bound meet upper bound and therefore, there is no great interest to explore this area. For stochastic oblivious adversarial where every loss/reward all coming from one fixed distribution with different parameters, it has been proved the upper bound is O(log T).
Next the research area might be extended to m-memory-bounded adversary not restricted on only switch cost.
相關文章
- cost量化分析
- 表連線cost
- [譯] Javascript開銷(Cost)JavaScript
- PostgreSQL Cost Based Vacuum探秘SQL
- OPTIMIZER_IDEX_COSTIDE
- Cost objects in SAP R/3Object
- oracle cost計算方式Oracle
- PostgreSQL DBA(175) - Cost EST(SeqScan)SQL
- Least Cost Bracket Sequence(貪心)ASTRacket
- What is the Average Cost of Doing a Diploma?
- SAP QM Cost of Quality Inspection
- Stock overview and cost assignment - 1View
- Stock overview and cost assignment - 2View
- Stock overview and cost assignment - 3View
- Stock overview and cost assignment - 4View
- Stock overview and cost assignment - 5View
- Stock overview and cost assignment - 6View
- in list查詢計算cost
- 調整index 後,cost 降低?Index
- Oracle 監聽投毒COST解決Oracle
- cheap ghd straightener At a cost to earningsAI
- Variance Calculation in Cost Object AccountingObject
- 安裝配置驗證COST(orapki)APK
- 6 Thing that determine composite cost
- 746. Min Cost Climbing StairsAI
- 等頻直方圖,計算COST直方圖
- 引數 optimizer_index_cost_adjIndex
- 【CodeChef】Graph Cost(動態規劃)動態規劃
- 全表掃描的cost 與 索引掃描Cost的比較 – 無直方圖(10.1.0.3以後)索引直方圖
- OPTIMIZER_MODE、optimizer_index_cost_adjIndex
- LintCode-Minimum Adjustment Cost
- sql中使用函式導致cost高SQL函式
- 高效的SQL( clustering factor減少COST)SQL
- Cost Control: Inside the Oracle OptimizerIDEOracle
- oracle cost計算(轉自itpub網友)Oracle
- 邏輯迴歸損失函式(cost function)邏輯迴歸函式Function
- optimizer_index_cost_adj引數的作用Index
- OPTIMIZER_INDEX_COST_ADJ與成本計算Index