[Active Learning] Multi-Criteria-based Active Learning

wuliytTaotao發表於2019-04-22

原文網址 : https://www.cnblogs.com/wuliytTaotao/p/10748942.html

Active Learning (AL) 的 query criteria 大致可以分為 3 類：informativeness，representativeness 和 diversity。

下面我將分別介紹這三種 criteria，並介紹論文 [1] 中是如何結合三種 criteria 的。（這裡並不對 NER 部分做介紹。）

1 Informativeness

這種 criterion 用的應該是最多的，其中包括最簡單最常用的 uncertainty sampling。

在這篇論文中，也是將 sample 離 decision boundary 的距離來衡量該 sample 的 information。Closer to decision boundary, more informative.

僅用 informativeness 的 strategy 有可能會選到 outlier，故而需要考慮 representativeness。

[Active Learning] Multi-Criteria-based Active Learning

Fig. 1 [2] Outlier sample A is more informative than sample B and will be selected by informativeness query strategy.

2 Representativeness

Representativeness 的衡量需要比較兩個 samples 的 similarity，論文 [1] 中採取餘弦相似度來衡量 similarity。

一個 sample 的 representativeness 可以用它的 density 來量化，即等於該 sample 與其它所有 unlabeled set 中樣本 similarity 的均值。即：
\[ Denstity(\boldsymbol x_i) = \frac{\sum_{j \not = i} Sim(\boldsymbol x_i, \boldsymbol x_j)}{N-1} \]

其中，$N$ 表示 unlabeled set 的大小。

如果某一個 sample $\boldsymbol x^*$ 的 density 最大，那麼 $\boldsymbol x^*$ 也就是 unlabeled set 的 centroid。

當然，representativeness 的衡量不止論文 [1] 提到的這種方式，如論文 [3] 使用樣本與部分鄰居的 similarity 來表示 density，而不是整個 unlabeled set。

3 Diversity

Diversity 這個 criterion 是對 batch-mode active learning 才有的，當我們需要一次選擇多個 samples 時，如果不考慮 diversity，很可能會重複選擇同一區域的點，造成浪費。

論文 [1] 提出了兩種利用 diversity 的方法：Global 和 Local。

3.1 Global consideration

這種方式將 unlabeled set 用 K-means 劃分成 K 個區域，在每一輪選擇中，一個 batch 內的點需要從 K 個不同的區域中分別選擇。

在實際利用時，可能不會對整個 unlabeled set 進行 K-means 劃分，有可能只是對 unlabeled set 的一個子集進行劃分，提高效率。

3.2 Local consideration

這種方式就不太考慮 unlabeled set，關注的重點在要選擇的 batch 上。

在每一輪的 query 中，我們如果想要將一個 selected sample $\boldsymbol x_{new}$加入到 current batch，需要該 selected sample 和已經在 current batch 中的樣本有足夠大的區別，即 $Similarity(\boldsymbol x_{new}, \boldsymbol x_{old}) > \beta$，其中 $\beta$ 可以取整個 unlabeled set 樣本之間 similarity 的均值。

在 local method 的情況下，一個個 selected samples 將經過篩選順序加入到 batch 中。selected sample 是如何被 select 出的？可以 random，也可以用 informativeness 和 representativeness 的方式。

4 Combinations of three criteria

single-criterion 的 query strategy 在很多時候不如 multi-criteria 的 strategy。論文 [1][3] 中都有類似結論。

以下將介紹論文 [1] 提出的關於如何結合 informativeness、representativeness 和 diversity 三種 criteria 的兩種方式。

4.1 Strategy 1

流程：

使用 Informativeness 這一 single criterion 選出 top M 個 most informative 的 samples，將其組成一個集合 interSet；
對 interSet 集合進行 K-means 聚類，聚成 K 個 clusters，並選擇出每個 cluster 的 centroid 作為 selected sample 加入到 batch 中。(batch 的 size 也為 K。)

K-means 的 centroids 既代表了 interSet，又有 diversity。該 strategy 使用了 diversity 的 global method。

4.2 Strategy 2

流程：

按照 $\lambda \operatorname{Info}\left(\boldsymbol x_{i}\right)+(1-\lambda) \text {Density}\left(\boldsymbol x_{i}\right)$ 結合 informativeness 和 representativeness 這兩個 criteria，然後按照得分的高低選擇出 selected samples；
一個 selected sample 想要加入到 batch 中，必須要滿足新加入的點與已經在 batch 中的點的 similarity 大於某個閾值 $\beta$，即使用 diversity 的 local method 對 selected samples 再進行一次 diversity 篩選。

$\lambda$ 是一個超引數，需要人工設定，用來控制 informativeness 和 representativeness 的權重。論文 [3] 對 $\lambda$ 的取值做了更加詳細的研究，可以動態設定 $\lambda$ 的值。

4.3 Strategy 1 vs. Strategy 2

在論文 [1] 的實驗中，strategy 2 的效果要好於 strategy 1。

References

[1] Shen, D., Zhang, J., Su, J., Zhou, G., & Tan, C.-L. (2004). Multi-criteria-based active learning for named entity recognition. (ACL) https://doi.org/10.3115/1218955.1219030
[2] Burr Settles. (2009). Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison.
[3] Ebert, S., Fritz, M., & Schiele, B. (2012). RALF: A reinforced active learning formulation for object class recognition. (CVPR) https://doi.org/10.1109/CVPR.2012.6248108

[11]2020-CVPR-State-Relabeling Adversarial Active Learning論文筆記
2020-10-05
筆記
遷移學習（EDA）《Energy-based Domain Adaption with Active Learning for Emerging Misinformation Detection》
2023-03-08
遷移學習AIAPTORM
spring.profiles.active=@profiles.active@的含義
2024-09-23
Spring
Active Record Associations
2023-04-21
迴圈智慧的主動學習（Active Learning）技術探索與實踐：減少 80% 標註量
2020-06-10
mysqldump: Error: Binlogging on server not active
2024-03-04
MySqlErrorServer
[擴充套件] hieu-le active 判斷導航元素的 active 狀態
2018-08-07
套件
CSS E:active 選擇器
2018-10-07
CSS
WebKit Insie: Active 樣式表
2023-10-07
WebKit
[ABC163E] Active Infants
2024-06-27
CSS 搞事技巧：hover+active
2019-04-02
CSS
Zabbix Agent active主動模式配置
2018-08-16
模式
ORA-01153: an incompatible media recovery is active
2021-08-27
learning sequelize
2019-02-16
Meta Learning
2018-10-05
Imitation Learning
2020-10-18
MIT
Learning TypeScript
2019-02-21
TypeScript
簡要介紹Active Learning(主動學習)思想框架，以及從IF（isolation forest）衍生出來的演算法：FBIF（Feedback-Guided Anomaly Discovery）
2019-05-30
框架REST演算法GUIIDE
vue 專案整合active控制元件
2019-01-11
Vue控制元件
Zabbix Agent active主動模式監控
2018-08-16
模式
onMounted is called when there is no active component 已解決
2023-05-19
《machine learning》引言
2020-10-13
Mac
Python Learning: 03
2019-02-11
Python
Python Learning: 01
2019-02-07
Python
Learning MySQL and MariaDB
2018-12-30
MySql
Learning Rhino - 2
2021-09-09
Learning Rhino - 1
2021-09-09
gitglossary learning by examples
2022-09-16
Git
Machine Learning with Sklearn
2020-12-11
Mac
《深度學習》PDF Deep Learning: Adaptive Computation and Machine Learning series
2019-12-17
深度學習APTMac
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools)
2018-11-05
ErrorDeveloper
Yii2 - Active Record 輕鬆學習
2019-07-31
SpringBoot --spring.profiles.active相關問題
2020-09-27
Spring Boot
使用RMAN複製資料庫 active database
2019-02-14
資料庫Database
V$ACTIVE_SESSION_HISTORY檢視的使用
2018-12-27
Session
mac xcrun: error: active developer path 解決方法
2021-02-22
MacErrorDeveloper
論文解讀（MLDG）《Learning to Generalize: Meta-Learning for Domain Generalization》
2024-04-28
AI
day 3 of learning vue
2018-10-28
Vue