構建知識圖譜-初學

周若梣發表於2020-05-18

本文內容源自medium文章

A Knowledge Graph understanding and implementation tutorial for beginners[1]

什麼是知識圖譜?

知識圖譜的內容通常以三元組形式存在,Subject-Predicate-Object (spo)。

舉個例子:

Leonard Nimoy was an actor who played the character Spock in the science-fiction movie Star Trek

對上面的句子可以抽取到如下三元組:

image-20200518115208499

以知識圖譜形式可以表示為:

image-20200518115310348

上述由節點和關係組成的圖,就是一個簡單的知識圖譜。

如何搭建一個簡單的知識圖譜?

可以分為以下兩大步驟:

  • 知識提取
    • 資訊抽取,獲取三元組
    • 實體識別、實體連結、實體消歧(Disambiguation)、實體統一(Entity Resolution)
  • 圖構建
    • 儲存
    • 查詢

知識提取步驟是構建知識圖譜的關鍵,三元組可以通過依存分析得到。

動手構建一個簡單知識圖譜

此處只顯示程式碼執行過程與結果,完整程式碼請見github.

1. 三元組提取

藉助spacy

inputText = 'Startup companies create jobs and innovation. Bill Gates supports entrepreneurship.'

# Step 1: Knowledge Extraction. Output: SOP triples
knowledgeExtractionObj = KnowledgeExtraction()
sop_list = knowledgeExtractionObj.retrieveKnowledge(inputText)
#list_sop = sop_list.as_doc()
sop_list_strings = []
for sop in sop_list:
    temp = []
    temp.append(sop[0].text)
    temp.append(sop[1].text)
    temp.append(sop[2].text)
    sop_list_strings.append(temp)

print(sop_list_strings)

結果

image-20200518121130941

2. 實體連結

# Step 2: Entity recognition and linking. This step needs to be linked.
entityRecognitionLinkingObj = EntityRecognitionLinking()
entityRelJson = entityRecognitionLinkingObj.entityRecogLink(inputText)

entityLinkTriples = []
for sop in sop_list_strings:
    tempTriple = ['', '', '']
    for resource in entityRelJson['Resources']:
        if resource['@surfaceForm'] == sop[0]:
            tempTriple[0] = resource['@URI']
        if resource['@surfaceForm'] == sop[1]:
            tempTriple[1] = resource['@URI']
        if resource['@surfaceForm'] == sop[2]:
            tempTriple[2] = resource['@URI']
    entityLinkTriples.append(tempTriple)
print(entityLinkTriples)

結果

image-20200518121205037

3. 圖構建

使用neo4j

# Step 3: Knowledge Graph creation.
graphPopulationObj = GraphPopulation()
graphPopulationObj = graphPopulationObj.popGraph(
    sop_list_strings, entityLinkTriples)

image-20200518121223303

最終得到圖如下:

image-20200518121314890

可能遇到的問題

  • Q1
AuthError: The client is unauthorized due to authentication failure.

解決辦法:

確保圖資料庫配置時密碼一致與設定的一致 (以下配置表示,user:neo4j,password:neo4j)

config.DATABASE_URL = 'bolt://neo4j:neo4j@localhost:7687'#default
  • Q2
ServiceUnavailable: Failed to establish connection to ('127.0.0.1', 7687) (reason [WinError 10061] 由於目標計算機積極拒絕,無法連線。)

解決辦法:

確保在執行圖建立程式碼前已經開啟neo4j

有問題歡迎留言,一起交流

[1]https://medium.com/analytics-vidhya/a-knowledge-graph-implementation-tutorial-for-beginners-3c53e8802377

[2]https://github.com/kramankishore/Knowledge-Graph-Intro

[3]https://neomodel.readthedocs.io/en/latest/getting_started.html#connecting

[4]https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/

相關文章