NNLM
NNLM:Neural Network Language Model,神經網路語言模型。源自Bengio等人於2001年發表在NIPS上的《A Neural Probabilistic Language Model一文。
理論
模型結構
任務
根據\(w_{t-n+1}...w_{t-1}\)來預測\(w_t\)是什麼單詞,即用\(n-1\)個單詞來預測第\(n\)個單詞
符號
- \(V\):詞彙的總數,即詞彙表的大小
- \(m\):詞向量的長度
- \(C\):\(V\)行,m列的矩陣表示詞向量詞表
- \(C(w)\):單詞w的詞向量
- \(d\):隱藏層的偏置
- \(H\):隱藏層的權重
- \(U\):隱藏層到輸出層的權重
- \(b\):輸出層的偏置
- \(W\):輸入層到輸出層的權重
- \(h\):隱藏層的神經元個數
Data Flow
- 獲取\(n-1\)個詞的詞向量,每個詞向量的長度是\(m\)
- 進行這\(n-1\)個詞向量的拼接,形成一個\((n-1)*m\)長度的向量,記做\(X\)
- 將\(X\)送入隱藏層,計算\(hidden_{out}=tanh(X*H+d)\)的到隱藏層的輸出
- 將隱藏層的輸出和輸入的詞向量同時送入輸出層,計算\(y=X*W+hidden_{out}*U+b\),得到輸出層\(|V|\)個節點的輸出,第\(i\)個節點的輸出代表下一個單詞是第\(i\)個單詞的概率。概率最大的單詞為預測到的單詞。
程式碼
Import依賴模組
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
from torch.autograd import Variable
dtype = torch.FloatTensor
宣告變數
sentences = ["i like dog", "i love coffee", "i hate milk"] # 句子資料集
n_steps = 2 # 用前幾個單詞來預測下一個單詞,e.g. 2個
n_hidden = 2 # 隱藏層的節點個數,e.g. 2個
m = 2 # 詞向量的長度
生成詞表
word_list = " ".join(sentences).split(" ") # 獲取所有的單詞
print("未去重詞表:", word_list)
word_list = list(set(word_list)) # 去重
print("去重詞表:", word_list)
word_dict = {w: i for i, w in enumerate(word_list)} # 單詞->索引
print("單詞索引:", word_dict)
number_dict = {i: w for i, w in enumerate(word_list)} # 索引->單詞
print("索引單詞:", number_dict)
num_words = len(word_dict) # 單詞總數
print("單詞總數:", num_words)
輸出
未去重詞表: ['i', 'like', 'dog', 'i', 'love', 'coffee', 'i', 'hate', 'milk']
去重詞表: ['coffee', 'love', 'dog', 'like', 'milk', 'hate', 'i']
單詞索引: {'coffee': 0, 'love': 1, 'dog': 2, 'like': 3, 'milk': 4, 'hate': 5, 'i': 6}
索引單詞: {0: 'coffee', 1: 'love', 2: 'dog', 3: 'like', 4: 'milk', 5: 'hate', 6: 'i'}
單詞總數: 7
模型結構
class NNLM(nn.Module):
# NNLM model architecture
def __init__(self):
super(NNLM, self).__init__()
self.C = nn.Embedding(num_embeddings = num_words, embedding_dim = m) # 詞表
self.d = nn.Parameter(torch.randn(n_hidden).type(dtype)) # 隱藏層的偏置
self.H = nn.Parameter(torch.randn(n_steps * m, n_hidden).type(dtype)) # 輸入層到隱藏層的權重
self.U = nn.Parameter(torch.randn(n_hidden, num_words).type(dtype)) # 隱藏層到輸出層的權重
self.b = nn.Parameter(torch.randn(num_words).type(dtype)) # 輸出層的偏置
self.W = nn.Parameter(torch.randn(n_steps * m, num_words).type(dtype)) # 輸入層到輸出層的權重
def forward(self, input):
'''
input: [batchsize, n_steps]
x: [batchsize, n_steps*m]
hidden_layer: [batchsize, n_hidden]
output: [batchsize, num_words]
'''
x = self.C(input) # 獲得一個batch的詞向量的詞表
x = x.view(-1, n_steps * m)
hidden_out = torch.tanh(torch.mm(x, self.H) + self.d) # 獲取隱藏層輸出
output = torch.mm(x, self.W) + torch.mm(hidden_out, self.U) + self.b # 獲得輸出層輸出
return output
格式化輸入
def make_batch(sentences):
'''
input_batch:一組batch中前n_steps個單詞的索引
target_batch:一組batch中每句話待預測單詞的索引
'''
input_batch = []
target_batch = []
for sentence in sentences:
word = sentence.split()
input = [word_dict[w] for w in word[:-1]]
target = word_dict[word[-1]]
input_batch.append(input)
target_batch.append(target)
return input_batch, target_batch
input_batch, target_batch = make_batch(sentences)
input_batch = torch.LongTensor(input_batch)
target_batch = torch.LongTensor(target_batch)
print("input_batch:", input_batch)
print("target_batch:", target_batch)
輸出
input_batch: tensor([[6, 3],
[6, 1],
[6, 5]])
target_batch: tensor([2, 0, 4])
訓練
model = NNLM()
criterion = nn.CrossEntropyLoss() # 使用cross entropy作為loss function
optimizer = optim.Adam(model.parameters(), lr = 0.001) # 使用Adam作為optimizer
for epoch in range(2000):
# 梯度清零
optimizer.zero_grad()
# 計算predication
output = model(input_batch)
# 計算loss
loss = criterion(output, target_batch)
if (epoch + 1) % 100 == 0:
print("Epoch:{}".format(epoch+1), "Loss:{:.3f}".format(loss))
# 反向傳播
loss.backward()
# 更新權重引數
optimizer.step()
輸出
Epoch:100 Loss:1.945
Epoch:200 Loss:1.367
Epoch:300 Loss:0.937
Epoch:400 Loss:0.675
Epoch:500 Loss:0.537
Epoch:600 Loss:0.435
Epoch:700 Loss:0.335
Epoch:800 Loss:0.234
Epoch:900 Loss:0.147
Epoch:1000 Loss:0.094
Epoch:1100 Loss:0.065
Epoch:1200 Loss:0.047
Epoch:1300 Loss:0.036
Epoch:1400 Loss:0.029
Epoch:1500 Loss:0.023
Epoch:1600 Loss:0.019
Epoch:1700 Loss:0.016
Epoch:1800 Loss:0.014
Epoch:1900 Loss:0.012
Epoch:2000 Loss:0.011
推理
pred = model(input_batch).data.max(1, keepdim=True)[1] # 找出概率最大的下標
print("Predict:", pred)
print([sentence.split()[:2] for sentence in sentences], "---->", [number_dict[n.item()] for n in pred.squeeze()])
輸出
Predict: tensor([[2],
[0],
[4]])
[['i', 'like'], ['i', 'love'], ['i', 'hate']] ----> ['dog', 'coffee', 'milk']
可以和我們的資料集做對比預測準確的。