知識表示學習 (一) —— Point-Wise Space之2

Storm*Rage發表於2020-12-13

知識表示學習 (一) —— Point-Wise Space之2


  上一篇部落格 知識表示學習 (一) —— Point-Wise Space之1中以TransE以及其變體和衍生為主的 點向量平移模型,這篇部落格中對其中的一部分模型進行簡單的實現。

  由於在進行評估時,需要處理mean rank,這個引數需要對所有的實體數量進行篩查和排序。因此我在實驗中使用的資料集是Freebase知識圖譜的子集FB15k,包含14951個實體,1345種關係,總共483142個三元組。FB15k中所包含的實體數是常用的資料集中最少的。

一、資料處理

  資料處理在pre_process_data.py中完成,其主要作用:是將公開資料集中的三元組進行整理和編號,每一個entity或者relation對應一個唯一的index;然後將

  • 資料轉化:由於公開資料集中的三元組均是字串的形式,因此需要將資料集中的三元組進行整理和編號,每一個entity或者relation對應一個唯一的id,把對應的字串對映到相應的數字index上;
  • 負樣本生成:在TransE等模型中,損失函式的計算中採用了負取樣的方式構建負樣例,因此需要對每一個三元組構建一個負樣例
  • DataSet生成:為後面的DataLoader做準備,提供每一個batch所得到的資料內容
  • DataLoader生成:對之前處理好的資料進行分批,為後續的訓練做準備

  在load_data函式中完成資料轉化的工作,將訓練集的三元組轉化為對應的id,並儲存對映關係,為後面測試集的轉化提供參照。並記錄了entity、relation的數量等統計資訊。

def load_data(self):
    file_pathname = self.filepath + self.filename
    train_df = pd.read_csv(filepath_or_buffer=file_pathname,
                           sep='\t',
                           header=None,
                           names=['head', 'relation', 'tail'],
                           keep_default_na=False,
                           encoding='utf-8')
    train_df = train_df.applymap(lambda x: x.strip())  # 每一個單元格進行切分
    # 統計每一類的數量 dict儲存
    head_count = Counter(train_df['head'])
    tail_count = Counter(train_df['tail'])
    relation_count = Counter(train_df['relation'])
    # 記錄entity和relation的key
    entity_list = list((head_count + tail_count).keys())
    relation_list = list(relation_count.keys())
    # 構造entity和relation的(key,index)結構
    entity_dict = dict([(word, idx) for idx, word in enumerate(entity_list)])
    relation_dict = dict([(word, idx) for idx, word in enumerate(relation_list)])
    # 將df中的key轉化為index
    train_df['head'] = train_df['head'].apply(lambda cell_key: entity_dict[cell_key])
    train_df['tail'] = train_df['tail'].apply(lambda cell_key: entity_dict[cell_key])
    train_df['relation'] = train_df['relation'].apply(lambda cell_key: relation_dict[cell_key])
    return train_df.values, entity_dict, relation_dict

  在generate_neg函式中生成負樣本,對每一個三元組,隨機對其頭或尾結點進行修改,並且判斷其合理性。

def generate_neg(self):
    neg_candidates, i = [], 0
    neg_data = []
    population = list(range(self.entity_num))
    for idx, triple in enumerate(self.pos_data):
        while True:
            if i == len(neg_candidates):
                i = 0
                neg_candidates = random.choices(population=population, k=int(1e4))
            neg, i = neg_candidates[i], i + 1
            if random.randint(0, 1) == 0:
                # replace head
                if neg not in self.related_dict[triple[2]]:
                    neg_data.append([neg, triple[1], triple[2]])
                    break
            else:
                # replace tail
                if neg not in self.related_dict[triple[0]]:
                    neg_data.append([triple[0], triple[1], neg])
                    break

    return np.array(neg_data)

  修改__getitem__函式,每一個迭代返回正、負兩個三元組

def __getitem__(self, item):
    return [self.pos_data[item], self.neg_data[item]]

  利用DataLoader進行資料分批

train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=test_batch_size, shuffle=True)

二、整體流程

  程式主體在main.py函式中定義,包括了資料處理、模型的訓練和評估等整個流程。

  首先通過在pre_process_data中的函式構造dataset和dataloader。

train_dataset = TrainSet(filepath=FILE_PATH, filename=TRAIN_FILE_NAME)
test_dataset = TestSet(filepath=FILE_PATH, filename=TEST_FILE_NAME,
                           entity_dict=train_dataset.entity2index,
                           relation_dict=train_dataset.relation2index)

print(f'Data Load Success, pos_data: {len(train_dataset.pos_data)}, neg_data: {len(train_dataset.neg_data)}')

train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=test_batch_size, shuffle=True)

  定義模型和優化器,優化器採用SGD來進行梯度更新。

trans_model_ = TranE(train_dataset.entity_num,
                train_dataset.relation_num,
                device,
                embedding_dim=embedded_dim,
                # entity_embedding_dim=embedded_dim,
                # relation_embedding_dim=embedded_dim,
                d_norm=d_norm,
                gamma=gamma,
                ).to(device)
optimizer = optim.SGD(trans_model.parameters(), lr=lr, momentum=momentum)

  通過epoch進行迭代訓練,訓練的batch_size設定為32,評估的batch_size為64。

for epoch in range(num_epochs):
    torch.cuda.empty_cache()
    # Train Stage
    total_loss = 0
    trans_model.train()
    for batch_idx, (pos, neg) in enumerate(train_loader):
        pos, neg = pos.to(device), neg.to(device)
        # pos: [batch_size, 3] => [3, batch_size]
        pos = torch.transpose(pos, 0, 1)
        # pos_head, pos_relation, pos_tail: [batch_size]
        pos_head, pos_relation, pos_tail = pos[0], pos[1], pos[2]
        neg = torch.transpose(neg, 0, 1)
        # neg_head, neg_relation, neg_tail: [batch_size]
        neg_head, neg_relation, neg_tail = neg[0], neg[1], neg[2]
        loss = trans_model(pos_head, pos_relation, pos_tail, neg_head, neg_relation, neg_tail)
        total_loss += loss.item()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f"epoch {epoch + 1}, loss = {total_loss / train_dataset.__len__()}")

    # Evaluation Stage
    corrct_test = 0
    total_rank = 0
    with torch.no_grad():
        for batch_idx, data in enumerate(test_loader):
            data = data.to(device)
            # data: [batch_size, 3] => [3, batch_size]
            data = torch.transpose(data, 0, 1)
            temp_hits, temp_rank = trans_model.tail_predict(data[0], data[1], data[2], k=top_k)
            corrct_test += temp_hits
            total_rank += temp_rank

        print(
                f"===>epoch {epoch + 1}, hits@10 {corrct_test / test_dataset.__len__()}, mean rank {total_rank / test_dataset.__len__()}")

三、模型構建

3.1 TransE

class TranE(nn.Module):
    def __init__(self, entity_num, relation_num, device, embedding_dim=50, d_norm=2, gamma=1):
        """
        :param entity_num: number of entities
        :param relation_num: number of relations
        :param dim: embedding dim
        :param device:
        :param d_norm: measure d(h+l, t), either L1-norm or L2-norm
        :param gamma: margin hyperparameter
        """
        super(TranE, self).__init__()
        self.embedding_dim = embedding_dim
        self.d_norm = d_norm
        self.device = device
        self.gamma = torch.FloatTensor([gamma]).to(self.device)
        self.entity_num = entity_num
        self.relation_num = relation_num
        self.entity_embedding = nn.Embedding(entity_num, self.embedding_dim)
        self.relation_embedding = nn.Embedding(relation_num, self.embedding_dim)

        # 初始化嵌入矩陣的權重
        nn.init.uniform_(self.entity_embedding.weight,
                         a=-6 / math.sqrt(self.embedding_dim), b=6 / math.sqrt(self.embedding_dim))
        nn.init.uniform_(self.relation_embedding.weight,
                         a=-6 / math.sqrt(self.embedding_dim), b=6 / math.sqrt(self.embedding_dim))

        # 對關係權重矩陣標準化  x <= x / ||x||
        self.relation_embedding.weight.data = F.normalize(self.relation_embedding.weight.data, p=2, dim=1)

    def forward(self, pos_head, pos_relation, pos_tail, neg_head, neg_relation, neg_tail):
        # 對實體權重矩陣標準化
        self.entity_embedding.weight.data = F.normalize(self.entity_embedding.weight.data, p=2, dim=1)

        pos_dis = self.entity_embedding(pos_head.to(torch.int64)) + self.relation_embedding(
            pos_relation.to(torch.int64)) - self.entity_embedding(
            pos_tail.to(torch.int64))
        neg_dis = self.entity_embedding(neg_head.to(torch.int64)) + self.relation_embedding(
            neg_relation.to(torch.int64)) - self.entity_embedding(
            neg_tail.to(torch.int64))
        return self.calculate_loss(pos_dis, neg_dis).requires_grad_()

    def calculate_loss(self, pos_dis, neg_dis):
        distance_diff = self.gamma + torch.norm(pos_dis, p=self.d_norm, dim=1) - torch.norm(neg_dis, p=self.d_norm, dim=1)
        return torch.sum(F.relu(distance_diff))

    def tail_predict(self, head, relation, tail, k=10):
        # head: [batch_size]
        # h_and_r: [batch_size, embed_size] => [batch_size, 1, embed_size] => [batch_size, N, embed_size]
        h_and_r = self.entity_embedding(head.to(torch.int64)) + self.relation_embedding(relation.to(torch.int64))
        h_and_r = torch.unsqueeze(h_and_r, dim=1)
        h_and_r = h_and_r.expand(h_and_r.shape[0], self.entity_num, self.embedding_dim)
        # embed_tail: [batch_size, N, embed_size]
        embed_tail = self.entity_embedding.weight.data.expand(h_and_r.shape[0], self.entity_num, self.embedding_dim)

        # indices: [batch_size, N]
        # tail: [batch_size] => [batch_size, 1]
        indices = torch.norm(h_and_r - embed_tail, dim=2).argsort()
        tail = tail.view(-1, 1)

        return torch.sum(torch.eq(indices[:, 0:k], tail)).item(), \
               torch.sum(torch.eq(indices, tail).nonzero(as_tuple=False)[:, -1]).item()

3.2 TransH

  模型的總體構造與TransE類似,增加了將實體向關係超平面對映的投影操作。

self.hyperplane_embedding = nn.Embedding(relation_num, self.embedding_dim)
def project(self, entity, wr):  # 將實體向量投影到超平面
    # wr = F.normalize(wr, p=2, dim=-1)
    # entity = F.normalize(entity, p=2, dim=-1)
    return F.normalize(entity - torch.sum(entity * wr, dim=-1, keepdim=True) * wr, p=2, dim=-1)

3.3 TransM

  模型的總體構造與TransE類似,增加了得分函式計算時利用針對關係的權重係數的加權處理。

self.wr = nn.Embedding(relation_num, 1)
pos_dis = self.wr(pos_relation.to(torch.int64)) * torch.norm(pos_dis, p=2, dim=-1)
neg_dis = self.wr(neg_relation.to(torch.int64)) * torch.norm(neg_dis, p=2, dim=-1)

3.4 TransR

  模型的總體結構與TransH類似,將TransH的實體向量向關係超平面投影的操作變成了轉移矩陣 M r \mathbf{M}_r Mr與實體向量相乘的由實體空間向關係空間的轉移操作。同時,TransR允許實體向量和關係向量的維度不相同。

self.transfer_matrix = nn.Embedding(relation_num, self.entity_embedding_dim * self.relation_embedding_dim)
def transfer(self, entity, mr):  # 將實體向量對映到關係空間
    mr = mr.view(-1, self.entity_embedding_dim, self.relation_embedding_dim)
    # [batch_size, entity_dim] => [batch_size, 1, entity_dim]
    entity = entity.unsqueeze(dim=1)
    entity = torch.matmul(entity, mr)
    return F.normalize(entity.squeeze(), p=2, dim=-1)

3.5 TransD

  模型的原理是在TransR的“空間對映”思想下的轉化,採用三個對應的投影向量將實體向量對映到關係空間,並且對映與實體和關係均有關。

self.entity_matrix = nn.Embedding(entity_num, self.entity_embedding_dim)
self.relation_matrix = nn.Embedding(relation_num, self.relation_embedding_dim)
def project(self, entity, entity_p, relation_p):  # 將實體向量投影到超平面
    return F.normalize(
        self.resize(entity, -1, relation_p.shape[-1]) + torch.sum(entity * entity_p, dim=-1, keepdim=True) * relation_p,
        p=2,
        dim=-1
    )
def resize(self, tensor, axis, size):
    shape = tensor.size()
    osize = shape[axis]
    if osize == size:
        return tensor
    elif (osize > size):
        return torch.narrow(tensor, axis, 0, size)
    else:
        pad = nn.ZeroPad2d(padding=(0, size - osize, 0, 0))
        return pad(tensor)

3.6 TransA

  TransA在TransE的基礎上,修改了基於距離衡量的得分函式的計算方式,通過權重矩陣 W r \mathbf{W}_r Wr,賦予每一個關係針對向量的各個維度的權重。

def calculate_loss(self, pos_dis, neg_dis, pos_relation_weight, neg_relation_weight):
    """
    :param pos_dis: [batch_size, 1, embed_dim]
    :param neg_dis: [batch_size, 1, embed_dim]
    :param neg_relation_weight: [batch_size, embed_dim, embed_dim]
    :param pos_relation_weight: [batch_size, embed_dim, embed_dim]
    :return: triples loss: [batch_size]
    """

    # bmm([B, 1, D], [B, D, D]) => [B, 1, D] * [B, 1, D] => sq => [B, D] => sum =>[B]
    pos_score = torch.sum((torch.bmm(pos_dis, pos_relation_weight) * pos_dis).squeeze(), dim=-1)
    neg_score = torch.sum((torch.bmm(neg_dis, neg_relation_weight) * neg_dis).squeeze(), dim=-1)
    distance_diff = self.gamma + pos_score - neg_score
    del pos_dis, neg_dis
    return torch.sum(F.relu(distance_diff))

四、實驗結果

  對每一個模型,進行一定量epoch的訓練,當ephch在30次左右時,結果如下:

  TransE
在這裡插入圖片描述

  TransH
在這裡插入圖片描述

  TransM
在這裡插入圖片描述

TransM這裡有個問題,可以發現mean rank還在降低,但是hits@10卻在下降,可能是模型陷入了一個優化了rank較大的連結,但是卻使得更多原本較好的連結結果下降的誤區中。

  TransR
在這裡插入圖片描述

  TransD
在這裡插入圖片描述

  TransA
在這裡插入圖片描述

相關文章