本文內容源自medium文章
A Knowledge Graph understanding and implementation tutorial for beginners[1]
什麼是知識圖譜?
知識圖譜的內容通常以三元組形式存在,Subject-Predicate-Object (spo)。
舉個例子:
Leonard Nimoy was an actor who played the character Spock in the science-fiction movie Star Trek
對上面的句子可以抽取到如下三元組:
以知識圖譜形式可以表示為:
上述由節點和關係組成的圖,就是一個簡單的知識圖譜。
如何搭建一個簡單的知識圖譜?
可以分為以下兩大步驟:
- 知識提取
- 資訊抽取,獲取三元組
- 實體識別、實體連結、實體消歧(Disambiguation)、實體統一(Entity Resolution)
- 圖構建
- 儲存
- 查詢
知識提取步驟是構建知識圖譜的關鍵,三元組可以通過依存分析得到。
動手構建一個簡單知識圖譜
此處只顯示程式碼執行過程與結果,完整程式碼請見github.
1. 三元組提取
藉助spacy
inputText = 'Startup companies create jobs and innovation. Bill Gates supports entrepreneurship.'
# Step 1: Knowledge Extraction. Output: SOP triples
knowledgeExtractionObj = KnowledgeExtraction()
sop_list = knowledgeExtractionObj.retrieveKnowledge(inputText)
#list_sop = sop_list.as_doc()
sop_list_strings = []
for sop in sop_list:
temp = []
temp.append(sop[0].text)
temp.append(sop[1].text)
temp.append(sop[2].text)
sop_list_strings.append(temp)
print(sop_list_strings)
結果
2. 實體連結
# Step 2: Entity recognition and linking. This step needs to be linked.
entityRecognitionLinkingObj = EntityRecognitionLinking()
entityRelJson = entityRecognitionLinkingObj.entityRecogLink(inputText)
entityLinkTriples = []
for sop in sop_list_strings:
tempTriple = ['', '', '']
for resource in entityRelJson['Resources']:
if resource['@surfaceForm'] == sop[0]:
tempTriple[0] = resource['@URI']
if resource['@surfaceForm'] == sop[1]:
tempTriple[1] = resource['@URI']
if resource['@surfaceForm'] == sop[2]:
tempTriple[2] = resource['@URI']
entityLinkTriples.append(tempTriple)
print(entityLinkTriples)
結果
3. 圖構建
使用neo4j
# Step 3: Knowledge Graph creation.
graphPopulationObj = GraphPopulation()
graphPopulationObj = graphPopulationObj.popGraph(
sop_list_strings, entityLinkTriples)
最終得到圖如下:
可能遇到的問題
- Q1
AuthError: The client is unauthorized due to authentication failure.
解決辦法:
確保圖資料庫配置時密碼一致與設定的一致 (以下配置表示,user:neo4j,password:neo4j)
config.DATABASE_URL = 'bolt://neo4j:neo4j@localhost:7687'#default
- Q2
ServiceUnavailable: Failed to establish connection to ('127.0.0.1', 7687) (reason [WinError 10061] 由於目標計算機積極拒絕,無法連線。)
解決辦法:
確保在執行圖建立程式碼前已經開啟neo4j
有問題歡迎留言,一起交流
[2]https://github.com/kramankishore/Knowledge-Graph-Intro
[3]https://neomodel.readthedocs.io/en/latest/getting_started.html#connecting
[4]https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/