利用詞向量進行推理（Reasoning with word vectors）

Hisi發表於2022-01-22

原文網址 : https://www.cnblogs.com/hisi-tech/p/15835467.html

The amazing power of word vectors | the morning paper (acolyer.org)

What is a word vector?

At one level, it’s simply a vector of weights. In a simple 1-of-N (or ‘one-hot’) encoding every element in the vector is associated with a word in the vocabulary. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero.

從某種角度來說，詞向量（word vector）僅僅是對應單詞的權重用的向量化表示。在獨熱編碼中，向量中的每個編碼元素與一個文字對應的詞彙表中的特定的一個單詞有聯絡，一個單詞會對應一個獨熱向量，這個獨熱向量中對應這個單詞的維度的元素會被置為1，其餘的全都置為0.

Suppose our vocabulary has only five words: King, Queen, Man, Woman, and Child. We could encode the word ‘Queen’ as:

假設我們的詞彙表只有5個單詞，國王、女王、男人、女人和孩子，那麼我們可以把“女王”編碼為：

Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing.

但是這樣的編碼會使得兩個向量之間沒有什麼比較意義，除了判斷兩個向量是否相等。

In word2vec, a distributed representation of a word is used. Take a vector with several hundred dimensions (say 1000). Each word is representated by a distribution of weights across those elements. So instead of a one-to-one mapping between an element in the vector and a word, the representation of a word is spread across all of the elements in the vector, and each element in the vector contributes to the definition of many words.

在word2vec中，我們對單詞使用了一種數值分散（非零數值分散）的表示方法。假設一個向量有好幾百維（比如1000維），每個單詞都會被1000個權重表示，這些權重很有可能非零而且表示的是與其他單詞的關係的大小。所以一個單詞的表示會與這個1000維的向量的所有其他元素都有關係，每個元素都會對這個單詞的定義做或多或少的貢獻，所以我們會用其代替使用一一對映的獨熱編碼來表示單詞或片語。

royaling：與皇室相關的；masculining：與男子漢氣概相關的；feminining：與女子特點相關的；Age：年齡。

從圖中我們可以看到，King和Queen與royaling很相關，這顯而易見，所以這個維度上King和Queen的分數應該要高點，其他也是如此。

另外可以這樣思考：V_Gueen - V_Woman = [0.97 0.04 -0.069 0.1]（記為Vt），這個向量表示什麼意思？V_man+Vt 等於什麼呢？約等於V_King嗎？下面會給出答案。

Reasoning with word vectors

We find that the learned word representations in fact capture meaningful syntactic and semantic regularities in a very simple way. Specifically, the regularities are observed as constant vector offsets between pairs of words sharing a particular relationship. For example, if we denote the vector for word i as xi, and focus on the singular/plural relation, we observe that xapple – xapples ≈ xcar – xcars, xfamily – xfamilies ≈ xcar – xcars, and so on. Perhaps more surprisingly, we find that this is also the case for a variety of semantic relations, as measured by the SemEval 2012 task of measuring relation similarity.

我們發現模型學習到的詞表示方法（詞向量）實際上能夠以一種很簡單的方式表示單詞對應的語法（Kings，King）和語義規律。更具體的來說，這些規律是用擁有特定關係的詞對所對應的兩個向量的差來表示的。例如，假設我們把單詞i對應的向量記為Vi，那麼對於表示單複數這個維度上的數值K來說應該有：apples_k – apple_k ≈ cars_k – car_k, ≈ families_k – family_k .可能更讓人驚訝的是，我們發現在2012年的SemEval關係相似度測量任務中，各種各樣的語義（語義：這個單詞是什麼意思）關係都可以有上述所說的性質。

The vectors are very good at answering analogy questions of the form a is to b as c is to ?. For example, man is to woman as uncle is to ? (aunt) using a simple vector offset method based on cosine distance.

這樣的向量非常適合於分析“a對應b，那麼c對應什麼”這樣的分析性問題。例如，如果男人對應女人，那麼伯父應該對應什麼？（伯母）

For example, here are vector offsets for three word pairs illustrating the gender relation:

如下圖所示，圖中用向量的差（相對偏移量）表示了三個詞對之間的性別關係。

And here we see the singular plural relation:

另外還有單複數的關係：

This kind of vector composition also lets us answer “King – Man + Woman = ?” question and arrive at the result “Queen” !

上圖所展示的向量有助於我們回答“國王-男人+女士=？”的問題，另外答案就是“女王”。

GitHub上中文詞向量（Chinese Word Vectors）的下載地址
2020-10-10
Github
詞向量word to vector通俗理解
2020-11-01
流和向量（Streams and Vectors）
2024-11-19
詞向量表示：word2vec與詞嵌入
2020-04-25
基於word2vec訓練詞向量(一)
2018-04-11
【詞向量表示】Word2Vec原理及實現
2024-12-04
亂燉“簡書交友”資料之程式碼（2）：關鍵詞抽取、Word2Vec詞向量
2018-06-16
詞向量入門
2020-05-27
[譯] 利用 Keras 深度學習庫進行詞性標註教程
2018-04-28
Keras深度學習詞性標註
文字情感分析(二)：基於word2vec和glove詞向量的文字表示
2019-05-19
「NLP-詞向量」從模型結構到損失函式詳解word2vec
2019-09-10
模型函式
HarmonyOS：使用 MindSpore Lite 引擎進行模型推理
2023-12-14
模型
elasticsearch高亮之詞項向量
2022-03-15
Elasticsearch
SciTech-Logics-Formal Logic-Reasoning(推理)的兩大種類{ 歸納和演繹 }
2024-07-29
ORM
word_cloud 中文詞雲
2019-01-19
Cloud
使用Tensorflow Object Detection進行訓練和推理
2021-04-26
Object
利用perf進行效能分析
2024-06-09
【詞向量表示】Item2Vec、DeepWalk、Node2vec、EGES詞向量生成演算法
2024-12-05
演算法
利用Xcode進行重簽名
2018-05-09
XCode
利用JSONP進行水坑攻擊
2020-08-19
JSON
ElasticSearch中使用ik分詞器進行實現分詞操作
2024-03-21
Elasticsearch分詞
怎樣生成一個好的詞向量
2018-06-16
親手做的詞向量分佈圖
2024-08-05
如何利用Photoshop進行快速切圖
2018-11-19
利用Kettle進行資料同步（下）
2019-01-19
利用Kettle進行資料同步（上）
2018-06-04
利用PCA進行資料降維
2020-11-10
PCA
Markdown 利用HTML進行優雅排版
2023-03-15
HTML
利用位移法進行除法運算
2020-12-09
【譯】如何在每次訓練中都得到相同的word2vec/doc2vec/Paragraph Vectors
2018-11-08
LLM中詞向量的表示和詞嵌入的一些疑問
2024-10-13
Springboot如何利用模板，快速生成word文件？
2024-11-24
Spring Boot
筆記六：通過 Analyzer 進行分詞
2019-10-15
筆記分詞
文字資料分析——主題提取+詞向量化
2020-10-20
利用compareTo方法進行字串比較排序
2020-10-28
字串排序
python利用ffmpeg進行rtmp推流直播
2019-07-04
Python
薦書 | 《利用Python進行資料分析》
2019-05-13
Python
利用DNS日誌進行MySQL盲注
2021-01-06
DNSMySql

利用詞向量進行推理（Reasoning with word vectors）

What is a word vector?

Reasoning with word vectors

相關文章