知識圖譜聽起來很高大上,而且也應用廣泛。而圖資料庫,你可以到網上搜搜,基本就是像 neo4j, janusgraph, HugeGraph...
如果想讓做個類似的圖譜的東西,你會怎麼辦呢?一來就上真的圖譜真的好嗎?也許前期就三兩個關係鏈,也許只是業務試水,你就去搞個真的圖資料庫過來?是不是太浪費了。
是的,實際上前期我們最好自己實現一些簡單的關係鏈維護即可。
那麼,為了能夠適應稍微的關係變化,也許我們還是需要效仿下圖資料庫的概念。那麼,現在的第一個問題就是:如何使用文字表述一個圖關係鏈?
1. 如何定義規範?
圖資料庫三大要素: 實體, 關係, 客體 。
實際上要解決這個問題倒也不難,只要自己定一種表示方法,自己能看懂就行,不去管其他人。比如用 '1,2,3' 代表先1後2再3... 但實際上,想要表示稍微複雜點的結構,也許並不是特別容易呢。而且,如果想要考慮後續可能的切真正的圖資料庫,為何不參考下別人的標準呢?
比如現在通用些的,cypher, gremlin... 大家可以網上搜尋下資料,參考下來,好像cypher更形象化些,尤其是各種箭頭的使用比較方便。
比如要表示A與的B的關係可以是: (:A)-[:關係]->(:B)
而對於多個複雜關係,則可以用多個類似的關係關聯起來就可以了。
嗯,看起來不錯。表示的方式定好了,那麼我們如何具體處理關係呢?
2. 如何表示一個現實的圖關係?
如下圖所示,我們有如下關係,應該如何定義字元表達方法,以達到配置的目的?
按照第1節中我們定義的規範,我們可以用如下字串表示。
(:PEOPLE)-[:養寵物]->(:CAT)-[:吃]->(:RICE) ,(:PEOPLE)-[:吃]->(:RICE) ,(:PEOPLE)-[:養寵物]->(:DOG) ,(:PEOPLE)-[:擁有]->(:HOUSE) ,(:PEOPLE)-[:幹活]->(:JOB) ,(:CAT)-[:朋友]->(:DOG) ,(:DOG)-[:吃]->(:RICE) ,(:JOB)-[:產出]->(:BRICK) ,(:HOUSE)<-[:構件]-(:BRICK) ,(:HOUSE)<-[:構件]-(:GLASS)
應該說還是比較直觀的,基本上我們只要按照圖所示的關係,描述出出入邊和關係就可以了。而且還有相應的cypher官方規範支援,也不用寫文件,大家就可以很方便的接受了。
3. 如何解析圖關係?
如上,我們已經用字串表示出了關係了。但單是字串,是並不能被應用理解的。我們需要解析為具體的資料結構,然後才可以根據關係推匯出具體的血緣依賴。這是本文的重點。
實際也不復雜,我們僅僅使用到了cypher中非常少的幾個元素表示法,所以也僅需解析出該幾個字元,然後在記憶體中構建出相應的關係即可。
具體程式碼實現如下:
3.1. 解析框架
所謂框架就是整體流程管控程式碼,它會讓你明白整個系統是如何work的。
import com.my.mvc.app.common.helper.graph.GraphNodeEntityTree; import com.my.mvc.app.common.helper.graph.NodeDiscoveryDirection; import com.my.mvc.app.common.helper.graph.VertexEdgeSchemaDescriptor; import com.my.mvc.app.common.helper.graph.VertexOrEdgeType; import com.my.mvc.app.common.util.CommonUtil; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 功能描述: 簡單圖語法解析器(類 cypher 語法) * * 請參考網上 cypher 資料 * */ public class SimpleGraphSchemaSyntaxParser { /** * 解析配置圖譜關係配置為樹結構 * * @param cypherGraphSchema 類cypher語法的 關係表示語句 * @return 解析好的樹結構 */ public static GraphNodeEntityTree parseGraphSchemaAsTree(String cypherGraphSchema) { List<VertexEdgeSchemaDescriptor> flatNodeList = tokenize(cypherGraphSchema); return buildGraphAstTree(flatNodeList); } /** * 構建圖關係抽象語法樹 * * @param flatNodeList 平展的圖節點列表 * @return 構建好的例項 */ private static GraphNodeEntityTree buildGraphAstTree( List<VertexEdgeSchemaDescriptor> flatNodeList) { Map<String, GraphNodeEntityTree> uniqVertexContainer = new HashMap<>(); GraphNodeEntityTree root = new GraphNodeEntityTree(flatNodeList.get(0)); uniqVertexContainer.put(flatNodeList.get(0).getVertexLabelType(), root); GraphNodeEntityTree parent; GraphNodeEntityTree afterNode; for ( int i = 1; i < flatNodeList.size(); i++ ) { VertexEdgeSchemaDescriptor vertexOrEdge1 = flatNodeList.get(i); if(vertexOrEdge1.getNodeType() == VertexOrEdgeType.EDGE) { // 存在重複節點,需重建關係 VertexEdgeSchemaDescriptor vertexPrev = flatNodeList.get(i - 1); if(vertexPrev.getNodeType() != VertexOrEdgeType.VERTEX) { continue; } if(++i >= flatNodeList.size()) { throw new RuntimeException("缺少客體關係配置, near 邊[" + vertexOrEdge1.getRawWord() + "]"); } VertexEdgeSchemaDescriptor relation = vertexOrEdge1; VertexEdgeSchemaDescriptor vertexAfter = flatNodeList.get(i); parent = uniqVertexContainer.get(vertexPrev.getVertexLabelType()); afterNode = uniqVertexContainer.get(vertexAfter.getVertexLabelType()); if(parent == null) { parent = root; uniqVertexContainer.putIfAbsent(vertexAfter.getVertexLabelType(), parent); } if(afterNode == null) { afterNode = new GraphNodeEntityTree(vertexAfter); uniqVertexContainer.put(vertexAfter.getVertexLabelType(), afterNode); } if(relation.getDirection() == NodeDiscoveryDirection.OUT) { parent.addOutVertex(afterNode, relation); } else { parent.addInVertex(afterNode, relation); } } } root.setUniqVertexTypes(uniqVertexContainer); return root; } /** * 拆分圖關係schema為 可理解的邊和點 * * @param cypherGraphSchema 建關係語句,如 (:BASE_LABEL)-[:被組合引用]->(:COMPOSE_LABEL) * @return 拆解後的token列表 */ private static List<VertexEdgeSchemaDescriptor> tokenize(String cypherGraphSchema) { String[] relationArr = cypherGraphSchema.split(","); List<VertexEdgeSchemaDescriptor> flatNodeList = new ArrayList<>(); for (String relation1 : relationArr) { char[] src = relation1.trim().toCharArray(); for (int i = 0; i < src.length; i++) { char ch = src[i]; // 頂點 if(ch == '(') { StringBuilder specNameBuilder = new StringBuilder(); while (i + 1 < src.length) { char nextCh = src[i + 1]; if(nextCh == ':') { String vertexLabel = CommonUtil.readSplitWord( src, i, ':', ')', false); flatNodeList.add(VertexEdgeSchemaDescriptor.newVertex( specNameBuilder.toString() + ":" + vertexLabel, vertexLabel)); i += vertexLabel.length() + 2; break; } specNameBuilder.append(nextCh); ++i; } continue; } // 邊關係, (:SRC)-[:RELATION]->(:DST) if(ch == '-' && i + 1 < src.length && src[i + 1] == '[') { ++i; StringBuilder specNameBuilder = new StringBuilder(); while (i + 1 < src.length) { char nextCh = src[i + 1]; if(nextCh == ':') { String edgeLabel = CommonUtil.readSplitWord( src, i, ':', ']', false); int nextVertexStart = i + edgeLabel.length() + 2; if(nextVertexStart + 2 >= src.length) { throw new RuntimeException("血緣圖譜配置錯誤: 缺少客體" + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } if(src[++nextVertexStart] != '-' || src[++nextVertexStart] != '>') { throw new RuntimeException("血緣圖譜配置錯誤: 主體後面需緊跟關係 ->" + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } flatNodeList.add(VertexEdgeSchemaDescriptor.newEdge( specNameBuilder.toString() + ":" + edgeLabel, edgeLabel, NodeDiscoveryDirection.OUT)); i = nextVertexStart; break; } specNameBuilder.append(nextCh); ++i; } continue; } // 邊關係, (:SRC)<-[:RELATION]-(:DST) if(ch == '<') { if(i + 2 > src.length) { throw new RuntimeException("血緣配置錯誤: 長度不匹配, near '" + new String(src, i, src.length - i)); } if(src[++i] != '-' || src[++i] != '[') { throw new RuntimeException("血緣配置錯誤: 邊關係配置錯誤, near '" + new String(src, i, src.length - i)); } StringBuilder specNameBuilder = new StringBuilder(); while (i + 1 < src.length) { char nextCh = src[i + 1]; if(nextCh == ':') { String edgeLabel = CommonUtil.readSplitWord( src, i, ':', ']', false); int nextVertexStart = i + edgeLabel.length() + 2; if(nextVertexStart + 2 >= src.length) { throw new RuntimeException("血緣圖譜配置錯誤: 缺少客體" + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } if(src[++nextVertexStart] != '-' || src[nextVertexStart + 1] != '(') { throw new RuntimeException("血緣圖譜配置錯誤: 主體後面需緊跟關係 -> " + ", near '" + new String(src, nextVertexStart, src.length - nextVertexStart)); } flatNodeList.add(VertexEdgeSchemaDescriptor.newEdge( specNameBuilder.toString() + ":" + edgeLabel, edgeLabel, NodeDiscoveryDirection.IN)); i = nextVertexStart; break; } specNameBuilder.append(nextCh); ++i; } } } } return flatNodeList; } }
怎麼樣,不復雜吧。就是兩個步驟:1. 解析每個單個元素資訊; 2. 根據單元素資訊,構建出上下級關係;
使用 IN 代表入方向關係,用 OUT 代表出方向關係,每兩個頂點之間都有一條邊相連。大體就是這樣了。但是明顯,還有許多細節需要我們去考慮,比如邊關係放在哪裡?如何新增相關節點?這些東西是需要特定的資料結構支援的。看我細細道來:
3.2. 單節點數結構
所謂單節點,即是站在任意關係點上來看整體圖的結構,如果整個圖是連通的,那麼理論上,通過這個節點所以探索到任意其他節點。所以,其實它非常重要。
package com.my.mvc.app.common.helper.graph; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * 功能描述: 簡單圖結構樹描述類 * */ public class GraphNodeEntityTree { /** * 當前頂點描述 */ private VertexEdgeSchemaDescriptor vertex; /** * 關係邊容器 */ private Map<NodeDiscoveryDirection, List<RelationWithVertexDescriptor>> relations = new HashMap<>(); /** * 入射方向節點 */ private List<GraphNodeEntityTree> in = new ArrayList<>(); /** * 出射方向節點 */ private List<GraphNodeEntityTree> out = new ArrayList<>(); /** * 所有頂點例項容器 */ private Map<String, GraphNodeEntityTree> uniqVertexTypes; public GraphNodeEntityTree(VertexEdgeSchemaDescriptor vertex) { this.vertex = vertex; uniqVertexTypes = new HashMap<>(); } public void setUniqVertexTypes(Map<String, GraphNodeEntityTree> uniqVertexTypes) { this.uniqVertexTypes = uniqVertexTypes; } public void addRelation(VertexEdgeSchemaDescriptor srcEntity, VertexEdgeSchemaDescriptor relation, VertexEdgeSchemaDescriptor dstEntity) { List<RelationWithVertexDescriptor> list = relations.computeIfAbsent( relation.getDirection(), k -> new ArrayList<>()); list.add(new RelationWithVertexDescriptor(srcEntity, relation, dstEntity)); } public GraphNodeEntityTree addInVertex(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { embeddedEntity.addOutVertexInner(this, relation.reverseDirection()); addInVertexInner(embeddedEntity, relation); return embeddedEntity; } private GraphNodeEntityTree addInVertexInner(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { in.add(embeddedEntity); addRelation(vertex, relation, embeddedEntity.getVertex()); return embeddedEntity; } public GraphNodeEntityTree addOutVertex(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { embeddedEntity.addInVertexInner(this, relation.reverseDirection()); addOutVertexInner(embeddedEntity, relation); return embeddedEntity; } private GraphNodeEntityTree addOutVertexInner(GraphNodeEntityTree embeddedEntity, VertexEdgeSchemaDescriptor relation) { out.add(embeddedEntity); addRelation(this.getVertex(), relation, embeddedEntity.getVertex()); return embeddedEntity; } public VertexEdgeSchemaDescriptor getVertex() { return vertex; } /** * 獲取關係名稱 * * @param nodeIndex 節點序號 * @param direction 方向 * @return 關係名稱描述 */ public String getRelationName(int nodeIndex, NodeDiscoveryDirection direction) { List<RelationWithVertexDescriptor> list = relations.get(direction); if(list == null || list.isEmpty()) { return null; } return list.get(nodeIndex).getRelationName(); } public List<GraphNodeEntityTree> getIn() { return in; } public List<GraphNodeEntityTree> getOut() { return out; } /** * 快速獲取圖節點根(根據頂點label) * * @param vertexLabel 頂點標識 * @return 節點所在例項, 找不到對應節點則返回 null */ public GraphNodeEntityTree getNodeEntityTreeByVertexLabel( String vertexLabel) { return uniqVertexTypes.get(vertexLabel); } }
可以說後續的操作入口都是在這裡的,所以重點關注。
3.3. 圖頂點和邊描述類
最開始有一個token化的過程,那麼token化之後,如何定義也比較重要,我們統一使用一個描述類來定義:
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 圖頂點和邊描述類 * */ public class VertexEdgeSchemaDescriptor { private String rawWord; private VertexOrEdgeType nodeType; private String vertexLabelType; private String relationName; private NodeDiscoveryDirection direction; private VertexEdgeSchemaDescriptor(String rawWord, VertexOrEdgeType nodeType, String vertexLabelType, String relationName, NodeDiscoveryDirection direction) { this.rawWord = rawWord; this.nodeType = nodeType; this.vertexLabelType = vertexLabelType; this.relationName = relationName; this.direction = direction; } /** * 新建頂點例項 * * @param rawWord 原始字元描述 * @param vertexLabelType 解析後的頂點型別(列舉完成所有點型別) * @return 頂點例項 */ public static VertexEdgeSchemaDescriptor newVertex(String rawWord, String vertexLabelType) { return new VertexEdgeSchemaDescriptor(rawWord, VertexOrEdgeType.VERTEX, vertexLabelType, null, null); } /** * 新建邊例項 * * @param rawWord 原始字元描述 * @param relationName 關係名稱(當id使用) * @param direction 關係方向( -> 出方向OUT, <- 入方向IN ) * @return 邊例項 */ public static VertexEdgeSchemaDescriptor newEdge(String rawWord, String relationName, NodeDiscoveryDirection direction) { return new VertexEdgeSchemaDescriptor(rawWord, VertexOrEdgeType.EDGE, null, relationName, direction); } public String getRawWord() { return rawWord; } public VertexOrEdgeType getNodeType() { return nodeType; } public String getVertexLabelType() { return vertexLabelType; } public String getRelationName() { return relationName; } public NodeDiscoveryDirection getDirection() { return direction; } public VertexEdgeSchemaDescriptor reverseDirection() { return new VertexEdgeSchemaDescriptor(rawWord, nodeType, vertexLabelType, "-" + relationName, direction.reverse()); } @Override public String toString() { // 點描述 if(nodeType == VertexOrEdgeType.VERTEX) { return nodeType + "{" + "rawWord='" + rawWord + '\'' + ", vertexLabelType=" + vertexLabelType + '}'; } // 邊描述 return nodeType + "{" + "rawWord='" + rawWord + '\'' + ", relationName='" + relationName + '\'' + ", direction=" + direction + '}'; } }
主要就是原始字串,定義邊、定義點。類似與單詞的聚合吧。
3.4. 節點關係描述
我們需要清楚地知道各個點與各個點間的關係,所以需要一個關係描述類,來展示這東西。(實際上核心並未使用該關係)
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 關係例項, 實體 -> 關係 -> 客體 * */ public class RelationWithVertexDescriptor { /** * 源點、起點 */ private final VertexEdgeSchemaDescriptor srcVertex; /** * 目標點 */ private final VertexEdgeSchemaDescriptor dstVertex; /** * 關係(名稱) */ private final VertexEdgeSchemaDescriptor relation; public RelationWithVertexDescriptor(VertexEdgeSchemaDescriptor srcVertex, VertexEdgeSchemaDescriptor relation, VertexEdgeSchemaDescriptor dstVertex) { this.srcVertex = srcVertex; this.dstVertex = dstVertex; this.relation = relation; } public VertexEdgeSchemaDescriptor getSrcVertex() { return srcVertex; } public VertexEdgeSchemaDescriptor getDstVertex() { return dstVertex; } /** * 獲取當前關係名稱 */ public String getRelationName() { return relation.getRelationName(); } @Override public String toString() { if(relation.getDirection() == NodeDiscoveryDirection.OUT) { return srcVertex.getRawWord() + "(" + srcVertex.getVertexLabelType() + ")" + " -> " + relation.getRelationName() + " -> " + dstVertex.getRawWord() + "(" + dstVertex.getVertexLabelType() + ")" ; } return srcVertex.getRawWord() + "(" + srcVertex.getVertexLabelType() + ")" + " <- " + relation.getRelationName() + " <- " + dstVertex.getRawWord() + "(" + dstVertex.getVertexLabelType() + ")" ; } }
雖實際用處不大,但是當你在debug的時候,這個描述類可以很方便地讓你觀察到解析是否正確。
3.5. 幾個基礎型別定義
1. 方向定義
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 探索方向定義 * * @since 2020/10/12 */ public enum NodeDiscoveryDirection { /** * 入方向, 上游 */ IN, /** * 出方向, 下游 */ OUT, ; public NodeDiscoveryDirection reverse() { if(this == OUT) { return IN; } return OUT; } }
2. 邊或點型別定義
package com.my.mvc.app.common.helper.graph; /** * 功能描述: 邊或點型別定義 * */ public enum VertexOrEdgeType { VERTEX, EDGE, ; }
如此,整個解析模組就完成了。你可以完整的將如上字元解析為實體關係了。
4. 單元測試
經過測試才算真正可用。
package com.my.test.common.parser; import com.my.mvc.app.common.helper.SimpleGraphSchemaSyntaxParser; import com.my.mvc.app.common.helper.graph.GraphNodeEntityTree; import com.my.mvc.app.common.helper.graph.NodeDiscoveryDirection; import org.junit.Test; import java.util.List; public class SimpleGraphSchemaSyntaxParserTest { // 測試指令碼 @Test public void testParseGraphSchema() throws InterruptedException { String graphSchema = "(:PEOPLE)-[:養寵物]->(:CAT)-[:吃]->(:RICE)\n" + ",(:PEOPLE)-[:吃]->(:RICE)\n" + ",(:PEOPLE)-[:養寵物]->(:DOG)\n" + ",(:PEOPLE)-[:擁有]->(:HOUSE)" + ",(:PEOPLE)-[:幹活]->(:JOB)" + ",(:CAT)-[:朋友]->(:DOG)" + ",(:DOG)-[:吃]->(:RICE)" + ",(:JOB)-[:產出]->(:BRICK)" + ",(:HOUSE)<-[:構件]-(:BRICK)" + ",(:HOUSE)<-[:構件]-(:GLASS)" ; GraphNodeEntityTree tree = SimpleGraphSchemaSyntaxParser .parseGraphSchemaAsTree(graphSchema); String searchFromLabel = "PEOPLE"; NodeDiscoveryDirection direction = NodeDiscoveryDirection.OUT; int maxDepth = 10; System.out.println("->" + searchFromLabel + ", direction:" + direction + ", depth:" + maxDepth); GraphNodeEntityTree searchRootFrom = tree.getNodeEntityTreeByVertexLabel(searchFromLabel); int allNodes = traversalNodesWithDirection(searchRootFrom, direction, maxDepth, maxDepth); System.out.println("allNodes: " + allNodes); Thread.sleep(5); } /** * 按某方向遍歷所有節點 * * @param root 搜尋起點 * @param direction 方向, IN, OUT * @param maxDepth 搜尋最大深度 * @param remainSearchDepth 剩餘搜尋深度 * @return 所有節點數 */ private static int traversalNodesWithDirection(GraphNodeEntityTree root, NodeDiscoveryDirection direction, int maxDepth, int remainSearchDepth) { if(remainSearchDepth <= 0) { return 0; } List<GraphNodeEntityTree> subBranches; if(direction == NodeDiscoveryDirection.OUT) { subBranches = root.getOut(); } else { subBranches = root.getIn(); } if(subBranches == null || subBranches.isEmpty()) { return 0; } String whitespaceUnit = " "; StringBuilder preWhitespaceBuilder = new StringBuilder(whitespaceUnit); for (int i = 1; i < maxDepth - remainSearchDepth + 1; i++) { preWhitespaceBuilder.append(whitespaceUnit); } int allNodes = 0; String preWhitespace = preWhitespaceBuilder.toString(); for (int i = 0; i < subBranches.size(); i++) { GraphNodeEntityTree br1 = subBranches.get(i); String relationName = root.getRelationName(i, direction); allNodes++; System.out.println(preWhitespace + "->" + relationName + "->" + br1.getVertex().getRawWord()); allNodes += traversalNodesWithDirection(br1, direction, maxDepth, remainSearchDepth - 1); } return allNodes; } }
結果樣例如下:
->PEOPLE, direction:OUT, depth:10 ->養寵物->:CAT ->吃->:RICE ->朋友->:DOG ->吃->:RICE ->吃->:RICE ->養寵物->:DOG ->吃->:RICE ->擁有->:HOUSE ->幹活->:JOB ->產出->:BRICK ->-構件->:HOUSE