The practice of high-performance graph computing system Plato in Nebula Graph

NebulaGraph發表於2022-11-24
This article was first published on Nebula Graph Community public number

1. Introduction to graph computing

1.1 Graph Database vs Graph Computing

Graph databases are oriented to OLTP scenarios, emphasizing additions, deletions, modifications, and queries, and a query often only involves a small amount of data in the entire graph, while graph computing is oriented to OLAP scenarios, and is often used to analyze and calculate the entire graph data.

1.2 Graph Computing System Distribution Architecture

According to the distributed architecture, graph computing systems are divided into stand-alone and distributed.

The advantage of a single-computer graph computing system is that the model is simple, and there is no need to consider distributed communication or graph segmentation.

The distributed graph computing platform divides graph data into multiple machines to process larger-scale graph data, but inevitably introduces the overhead of distributed communication.

1.3 Division of graphs

There are two main ways to divide the graph: Edge-Cut and Vertex-Cut.

Edge segmentation: The data of each point will only be stored on one machine, but some edges will be interrupted and distributed to multiple machines.
As shown in Figure (a), the data of point A is only stored on machine 1, and the data of point B is only stored on machine 2. For edge AB, it is stored on machine 1 and machine 2. Since point A and point B are distributed on different machines, it will bring communication overhead during the iterative calculation process.

Point splitting: Each edge will only be stored on one machine, but some points may be split and distributed on multiple machines.
As shown in Figure (b), edge AB is stored on machine 1, edge BC is stored on machine 2, edge CD is stored on machine 3, and point B is assigned to two machines 1 and 2, and point C is assigned To 2,3 two machines. Since points are stored on multiple machines, maintaining the consistency of vertex data also incurs communication overhead.

1.4 Computational Model

The programming model is aimed at graph computing application developers, and can be divided into a node-centric programming model, an edge- or path-centric programming model, and a subgraph-centric programming model.

Computational model is a problem faced by graph computing system developers, mainly including synchronous execution model and asynchronous execution model. The more common ones are the BSP model (Bulk Synchronous Parallel Computing Model) and the GAS model.

BSP model: The calculation process of the BSP model consists of a series of iterative steps, each of which is called a superstep. The systems that adopt the BSP model mainly include Pregel, Hama, Giraph and so on.
The BSP model has both a horizontal and a vertical structure. Vertically, the BSP model consists of a series of serial supersteps. Horizontally (as shown), a superstep is divided into three phases:

  • In the local computing phase, each processor only performs computations on data stored in local memory.
  • In the global communication phase, machine nodes exchange data with each other.
  • Fence synchronization phase, waiting for the end of all communication activities.

GAS model: The GAS model is proposed in the PowerGraph system and is divided into an information collection stage (Gather), an application stage (Apply) and a distribution stage (Scatter).

  • Gather stage , responsible for gathering information from neighbor vertices.
  • Apply stage is responsible for processing the collected information locally and updating it on the vertex.
  • Scatter stage , responsible for sending new information to neighbor vertices.

2. Introduction to Gemini graph computing system

Gemini is more influential in the industry, and its main technical points include: CSR/CSC, push/pull, master and mirror, sparse and dense graph, communication and computing cooperative work, chunk-based partitioning, NUMA-aware sub-partitioning Wait.

Gemini uses edge cutting to partition graph data in a chunk-based manner, and supports Numa structure. For the partitioned data, use CSR to store out-side information, and use CSC to store in-side information. In the iterative calculation process, the outgoing neighbors are updated by push for sparse graphs, and the information of incoming neighbors is pulled in dense graphs by pull.

If an edge is cut, the vertex at one end of the edge is the master, and the vertex at the other end is the mirror. Mirror is called a placeholder. During the calculation process of pull, the mirror vertices on each machine will pull the information of their incoming neighbor master vertices for a calculation, and synchronize them through the network under the BSP calculation model. master vertex. In the calculation process of push, the master vertex of each machine will first synchronize its information to its mirror vertex, and then the mirror will update its outgoing neighbors.

In the communication phase of the BSP, each machine Node_i sends to its next machine Node_i+1 , and the last machine sends to the first machine. Each machine will also receive the information of Node_i-1 when it is sent, and the local calculation will be performed immediately after receiving the information. The overlap of communication and computation can hide communication time and improve overall efficiency.

For more details, please refer to the paper "Gemini: A Computation-Centric Distributed Graph Processing System" .

3. Integration of Plato graph computing system and Nebula Graph

3.1 Introduction to Plato Graph Computing System

Plato is Tencent's open source industrial-grade graph computing system based on Gemni's paper. Plato can run on common x86 clusters, such as Kubernetes clusters, Yarn clusters, etc. At the file system level, Plato provides a variety of interfaces to support mainstream file systems, such as HDFS, Ceph, etc.

3.2 Integration with Nebula Graph

We did secondary development based on Plato to access the Nebula Graph data source.

3.2.1 Nebula Graph as input and output data source

Increase the data source of Plato, support Nebula Graph as input and output data source, directly read data from Nebula Graph for graph calculation, and write the calculation results directly back to Nebula Graph.

The storage layer of Nebula Graph provides a scan interface for partitions, and it is easy to scan vertex and edge data in batches through this interface:

ScanEdgeIter scanEdgeWithPart(std::string spaceName,
                                  int32_t partID,
                                  std::string edgeName,
                                  std::vector<std::string> propNames,
                                  int64_t limit = DEFAULT_LIMIT,
                                  int64_t startTime = DEFAULT_START_TIME,
                                  int64_t endTime = DEFAULT_END_TIME,
                                  std::string filter = "",
                                  bool onlyLatestVersion = false,
                                  bool enableReadFromFollower = true);

ScanVertexIter scanVertexWithPart(std::string spaceName,
                                      int32_t partId,
                                      std::string tagName,
                                      std::vector<std::string> propNames,
                                      int64_t limit = DEFAULT_LIMIT,
                                      int64_t startTime = DEFAULT_START_TIME,
                                      int64_t endTime = DEFAULT_END_TIME,
                                      std::string filter = "",
                                      bool onlyLatestVersion = false,
                                      bool enableReadFromFollower = true);

In practice, we first obtain the partition distribution under the specified space, and assign the scan task of each partition to each node of the Plato cluster, and each node further assigns the partition scan task to each node running on the node. Threads to achieve parallel and fast reading of data. After the graph calculation is completed, the calculation results are written to the Nebula Graph in parallel through the Nebula client.

3.2.2 Distributed ID Encoder

Gemini and Plato require vertex IDs to increase continuously from 0, but most real data vertex IDs do not meet this requirement, especially Nebula Graph supports string type IDs since version 2.0.

Therefore, we need to convert the original ID from an int or string type to an int that is continuously incremented from 0 before the calculation. Plato internally implements a stand-alone ID encoder, that is, each machine in the Plato cluster redundantly stores the mapping relationship of all IDs. When the number of points is relatively large, each machine only needs hundreds of GB of memory to store the ID mapping table, because we need to implement a distributed ID mapper, cut the ID mapping relationship into multiple copies, and store them separately.

We scatter the original IDs on different machines by hashing, and assign global IDs that start from 0 and increase in parallel in parallel. After the ID mapping relationship is generated, each machine will have a part of the ID mapping table. Then, the edge data is hashed according to the starting point and the ending point, and sent to the corresponding machine for encoding, and the final data is the data that can be used for calculation. After the calculation is completed, the required data needs to be mapped back to the business ID, and the process is similar to the above.

3.2.3 Supplementary algorithm

Based on Plato, we have added algorithms such as sssp, apsp, jaccard similarity, and triangular counting, and added support for input and output to the Nebula Graph data source for each algorithm. Currently supported algorithms are:

file nameAlgorithm nameClassification
apsp.ccAll-to-shortest pathpath
sssp.ccSingle source shortest pathpath
tree_stat.cctree depth/widthGraph Features
nstepdegrees.ccn orderGraph Features
hyperanf.ccGraph Average Distance EstimationGraph Features
triangle_count.cctriangle countGraph Features
kcore.cc Node Centrality
pagerank.ccPagerankNode Centrality
bnc.ccBetweennessNode Centrality
cnc.ccCloseness CentralityNode Centrality
cgm.ccconnected component calculationcommunity discovery
lpa.ccHashtag propagationcommunity discovery
hanp.ccHANPcommunity discovery
metapath_randomwalk.cc Graph Representation Learning
node2vec_randomwalk.cc Graph Representation Learning
fast_unfolding.cclouvainclustering
infomap_simple.cc clustering
jaccard_similarity.cc similarity
mutual.cc other
torch.cc other
bfs.ccbreadth-first traversalother

4. Plato deployment installation and operation

4.1 Cluster Deployment

Plato uses MPI for inter-process communication. When deploying Plato on a cluster, you need to install Plato in the same directory, or use NFS. See how to do it: https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/

4.2 Scripts and configuration files to run the algorithm

scripts/run_pagerank_local.sh

#!/bin/bash

PROJECT="$(cd "$(dirname "$0")" && pwd)/.."

MAIN="./bazel-bin/example/pagerank" # process name

WNUM=3
WCORES=8

#INPUT=${INPUT:="$PROJECT/data/graph/v100_e2150_ua_c3.csv"}
INPUT=${INPUT:="nebula:${PROJECT}/scripts/nebula.conf"}
#OUTPUT=${OUTPUT:='hdfs://192.168.8.149:9000/_test/output'}
OUTPUT=${OUTPUT:="nebula:$PROJECT/scripts/nebula.conf"}
IS_DIRECTED=${IS_DIRECTED:=true}  # let plato auto add reversed edge or not
NEED_ENCODE=${NEED_ENCODE:=true}
VTYPE=${VTYPE:=uint32}

ALPHA=-1
PART_BY_IN=false

EPS=${EPS:=0.0001}
DAMPING=${DAMPING:=0.8}
ITERATIONS=${ITERATIONS:=5}

export MPIRUN_CMD=${MPIRUN_CMD:="${PROJECT}/3rd/mpich-3.2.1/bin/mpiexec.hydra"}

PARAMS+=" --threads ${WCORES}"
PARAMS+=" --input ${INPUT} --output ${OUTPUT} --is_directed=${IS_DIRECTED} --need_encode=${NEED_ENCODE} --vtype=${VTYPE}"
PARAMS+=" --iterations ${ITERATIONS} --eps ${EPS} --damping ${DAMPING}"

# env for JAVA && HADOOP
export LD_LIBRARY_PATH=${JAVA_HOME}/jre/lib/amd64/server:${LD_LIBRARY_PATH}

# env for hadoop
export CLASSPATH=${HADOOP_HOME}/etc/hadoop:`find ${HADOOP_HOME}/share/hadoop/ | awk '{path=path":"$0}END{print path}'`
export LD_LIBRARY_PATH="${HADOOP_HOME}/lib/native":${LD_LIBRARY_PATH}

chmod 777 ./${MAIN}
${MPIRUN_CMD} -n ${WNUM} -f ${PROJECT}/scripts/cluster ./${MAIN} ${PARAMS}
exit $?

Parameter Description

  • INPUT parameter and OUPUT parameter specify the input data source and output data source of the algorithm respectively. Currently, local csv files, HDFS files, and Nebula Graph are supported. When the input and output data source is Nebula Graph, the form of INPUT and OUPUT is nebula:/path/to/nebula.conf
  • WNUM is the sum of the number of processes running on all machines in the cluster. It is recommended that each machine run as 1 or NUMA node processes. WCORE is the number of threads per process. It is recommended to set the maximum number of hardware threads of the machine.

scripts/nebula.conf

## read/write
--retry=3 # 連線 Nebula Graph 時的重試次數
--space=sf30 # 要讀取或寫入的 space 名稱

## read from nebula
--meta_server_addrs=192.168.8.94:9559 # Nebula Graph 的 metad 服務地址
--edge=LIKES # 要讀取的邊的名稱
#--edge_data_field # 要讀取的作為邊的權重屬性的名稱
--read_batch_size=10000 # 每次 scan 時的 batch 的大小

## write to nebula
--graph_server_addrs=192.168.8.94:9669 # Nebula Graph 的 graphd 服務地址
--user=root # graphd 服務的登陸使用者名稱
--password=nebula # graphd 服務的登陸密碼
# insert or update
--mode=insert # 寫回 Nebula Graph 時採用的模式: insert/update
--tag=pagerank # 寫回到 Nebula Graph 的 tag 名稱
--prop=pr # 寫回到 Nebula Graph 的 tag 對應的屬性名稱
--type=double # 寫回到 Nebula Graph 的 tag 對應的屬性的型別
--write_batch_size=1000 # 寫回時的 batch 大小
--err_file=/home/plato/err.txt # 寫回失敗的資料所儲存的檔案

scripts/cluster

The cluster file specifies the IP of the cluster machine on which to run the algorithm

192.168.15.3
192.168.15.5
192.168.15.6

The above is the application of Plato in Nebula Graph. At present, this function is integrated in Nebula Graph Enterprise Edition. If you are using the open source version of Nebula Graph, you need to connect Plato according to your own needs.


Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card , and the Nebula assistant will pull you into the group~~

Follow the public number

相關文章