《End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification》閱讀筆記

weixin_33872660發表於2017-11-15

論文連結：https://www.ijcai.org/proceedings/2017/0311.pdf

跨領域情感分類也是一個domain adaptation任務。因為領域矛盾，在一個領域上訓練的可能不能直接用於其他領域。傳統的方法是手動去挑選pivots(支點，核心)。深度學習的方法可以學習領域共享的表示，然而，缺乏解釋性去直接識別pivots。為了解決這個問題，本文提出一個端對端的對抗記憶網路(Adversarial Memory Network, AMN)去解決跨領域的情感分類。使用attention機制來自動抓取pivots。本文的框架包括兩個引數共享的memory network，一個用於sentiment分類，一個用於domain分類。兩個網路是聯合訓練的所以選擇的特徵可以最小化情感分類的錯誤，同時使領域分類器對於source和target domain是非歧視性的。

Introduction：

傳統做情感分類是基於支援向量機(support vector machine)，使用手工的特徵，例如bag-of-n-grams。

用深度學習依賴大量標註的資料，time-consuming and expensive manual labeling.

本文的貢獻：

1. 自動識別pivots，使用attention機制，不需要手動選擇pivots

2. AMN模型可以實現視覺化，告訴我們哪個是pivots，which makes the representation shared by domains more interpretable.

3. achieve better peroformance

Problem definition and notation(記號)：

有一組標記資料和未標記資料，在target domain中，有一組未標記資料。跨域情感分類是要學習一個robust的分類器，這個分類器是在有標註的source領域訓練的，用來預測target領域的未標記資料的polarity

The task of the cross-domain sentiment classification is to learn a robust classifier trained on labeled data in the source domain to predict the polarity of unlabeled examples from the target domain.

An overview of the AMN model：

memory network可以抓取相關聯的重要單詞，使用attention機制。

The goal of the AMN model is to automatically capture the pivots.

利用memory network來抽取pivots，這種pivots有兩種特性：1）對於情感分類很重要的情感詞 2）這些詞在領域之間是共享的。

設計了兩個引數共享的深度memory network，一個網路MN-sentiment 用做sentiment分類，一個網路MN-domain用做domain分類，致力於預測樣本中的domain labels。

給一個文件d={w1, w2, ..., wn}，首先將每個詞對映到embedding向量 ei = Awi，文件獲得一個向量表示 e = {e1, e2, ..., en}。這些word vectors堆疊起來輸入到一個外部的memory。

external memory

m是memory size，大於文件的最大長度，free memories用0做填充。每個memory network包括多個hops，每個包含一個attention層和一個linear層。在第一個hop中，使用query vector qw作為輸入通過attention層來抓取memory m中的重要單詞。query vector qw是在訓練過程中隨機初始化的 can be learned for a high-level representation according to the task of interest。

query vector輸入後，attention層和線性變換層的輸出結合起來作為下一個hop的輸入。最後一個hop的輸出作為整個document的representation，進一步被用做sentiment分類和domain分類。

對於MN-network，query vector qw可以被看作high-level representation of a fixed query"what is the sentiment word" over words

MN-domain中，給最後一個hop和domain分類器之間增加Gradient Reversal Layer(GRL)，用於reverse the gradient direction of the MN-domain network. 用這種方法可以生成一個domain分類器不能預測的表示，最大化domain的confusion。MN-domain使用的query vector可以看作"what is the domain-shared word"。

聯合訓練，query vector可以看作是“what is the pivot”

Components：

1. Word Attention

MN-sentiment network：根據source領域的有標註的資料來更新external memory ms。

MN-domain network：根據所有來自source領域和target領域的資料來更新external memory md。

首先將每個memory mi 通過放進一個一層的神經網路獲得hidden representation hi，通過比較hi和query vector qw的相似度來評價一個詞的重要度，獲得歸一化的重要度權重：

importance weight

n is the size of the memory occupied， hi=tanh(Wsmi + bs)。不使用全部的memory m，因為我們發現attention模型有時會分配很大的權重在free memories並且分配低的權重在occupied part，這樣會降低document representation的質量。free memories的權重都被設定為0.

attention層的輸出為：