orientDB學習筆記（二）MATCH

程式設計師的貓發表於2021-12-24

原文網址 : https://learnku.com/articles/63901

簡介

MATCH是orientdb 2.2版本引入的以宣告方式的模式匹配語言,主要用於查詢圖。是OrientDB最靈活最有效的查詢圖的SQL。它和Neo4j的cypher語言有點像，但目前MATCH僅支援用於查詢。

MATCH語法格式介紹

根據官方文件，MATCH的語法格式如下：

MATCH 
  {
    [class: <class>], 
    [as: <alias>], 
    [where: (<whereCondition>)]
  }
  .<functionName>(){
    [class: <className>], 
    [as: <alias>], 
    [where: (<whereCondition>)], 
    [while: (<whileCondition>)],
    [maxDepth: <number>],    
    [depthAlias: <identifier> ], 
    [pathAlias: <identifier> ],     
    [optional: (true | false)]
  }*
  [,
    [NOT]
    {
      [as: <alias>], 
      [class: <class>], 
      [where: (<whereCondition>)]
    }
    .<functionName>(){
      [class: <className>], 
      [as: <alias>], 
      [where: (<whereCondition>)], 
      [while: (<whileCondition>)],
      [maxDepth: <number>],    
      [depthAlias: <identifier> ], 
      [pathAlias: <identifier> ],     
      [optional: (true | false)]
    }*
  ]*
RETURN [DISTINCT] <expression> [ AS <alias> ] [, <expression> [ AS <alias> ]]*
GROUP BY <expression> [, <expression>]*
ORDER BY <expression> [, <expression>]*
SKIP <number>
LIMIT <number>

下面我們對主要的語法點作下簡要的介紹。

必須以MATCH關鍵字開頭，大小寫不敏感。
{}用於對一個node進行定義及條件過濾，這個node可以是點也可以是邊。
[]表示可選項。{}內所有的定義都是可選的，也就是說可以直接寫成{}。
<>表示具體的值。
定義一個有效的class，可以是一個點也可以是一個邊。
為node定義一個別名，在整個模式中可以根據這個別名來訪問這個node,類似於SQL中table的別名。
定義匹配當前node的過濾條件，它支援大部分SQL中的where語法。同時也可以使用兩個上下文變數$currentMatch和$matched，具體如何使用這兩個變數，後續會有例子詳細解釋。

定義一個用於表示連線兩個node的圖函式。它支援的函式有：out()、in()、both()、outE()、inE()、bothE()、outV()、inV()、bothV()。對於out()、in()和both()也可以用更形象化的箭頭表示法。下面我們著重對這9個函式作下詳細的介紹，注意右邊的node不是必須存在的。

函式	示例	箭頭表示法	左邊	右邊	方向
out()	{…}.out(){…}	{…}–>{…}	點	點	左指向右
	{…}.out(“EdgeClass”){…}	{…}-EdgeClass->{…}	點	點	左指向右
in()	{…}.in(){…}	{…}<–{…}	點	點	右指向左
	{…}.in(“EdgeClass”){…}	{…}<-EdgeClass-{…}	點	點	右指向左
both()	{…}.both(){…}	{…}–{…}	點	點	任意
	{…}.both(“EdgeClass”){…}	{…}-EdgeClass-{…}	點	點	任意
outE()	{…}.outE(){…}	無	點	邊	左指向右
	{…}.outE(“EdgeClass”){…}	無	點	邊	左指向右
inE()	{…}.inE(){…}	無	點	邊	右指向左
	{…}.inE(“EdgeClass”){…}	無	點	邊	右指向左
bothE()	{…}.bothE() {…}	無	點	邊	任意
	{…}.bothE(“EdgeClass”){…}	無	點	邊	任意
outV()	{…}.outV() {…}	無	邊	點	左指向右
	{…}.outV(“EdgeClass”){…}	無	邊	點	左指向右
inV()	{…}.inV() {…}	無	邊	點	右指向左
	{…}.inV(“EdgeClass”){…}	無	邊	點	右指向左
bothV()	{…}.bothV() {…}	無	邊	點	任意
	{…}.bothV(“EdgeClass”){…}	無	邊	點	任意

定義深度遍歷路徑上滿足的條件，它支援大部分SQL中的where語法，同時也可以使用上下文變數$currentMatch、$matched、$depth，具體如何使用這些變數，後續會有例子詳細解釋。
定義深度遍歷的最大深度，後續會有例子詳細解釋。
orientdb3.X新增加的特性，必須和while或者maxDepth一起使用，該值用於儲存遍歷的深度，在return中可以通過該值獲取每次遍歷深度的值。
orientdb3.X新增加的特性，必須和while或者maxDepth一起使用，該值用於儲存遍歷的路徑，在return中可以通過該值獲取每次遍歷路徑下的點。
optional 是orientdb2.2.4版本新增的特性。在預設情況下該選項的值為false，它的意思是所宣告的結點必須存在，否則不會匹配該條路徑上的資料。如果設定為true,那麼即使這個節點沒有匹配到，也不會影響整條路徑的匹配，但這個選項只能出現在路徑上最右邊的節點。類似於SQL中的left join。

RETURN [ AS ] 定義返回的資料結構。返回值包括如下三種：{…}中定義的別名、別名.欄位以及上下文變數。RETURN可使用的上下文變數詳細解釋：

變數名稱	解釋	備註
$matches	包括所有在{…}定義了別名的node。
$paths	包括所有遍歷路徑上node。包括沒有定義別名的node。
$elements	包括$matches返回的node展開的資料。	可以在graph控制檯上以圖的形式展示
$pathElements	包括$paths返回的node展開的資料。	可以在graph控制檯上以圖的形式展示

DISTINCT 3.X版本支援對RETURN的結果進行去重。注意3.X之前是不支援這個特性，需要通過在外層套一層SELECT然後DISTINCT去重。
GROUP BY 分組。3.X引入的特性。
ORDER BY 排序。3.X引入的特性。
SKIP 和LIMIT一起可進行分頁。3.X引入的特性。

MATCH的使用

3.1.在browse控制檯中使用

MATCH
{as:c,class:Customers,where:(Phone='+1400844724')}
RETURN c.Phone,c.OrderId

3.2.在graph控制檯中使用

在graph中以圖的形式顯示資料，需要藉助$pathElements或者$elements變數。

MATCH
{as:c,class:Customers,where:(Phone='+1400844724')}
RETURN $pathElements

3.3.使用API

maven依賴如下：

<dependency>
<groupId>com.orientechnologies</groupId>
<artifactId>OrientDB-graphdb</artifactId>
<version>3.0.4</version>
</dependency>
<dependency>
<groupId>com.orientechnologies</groupId>
<artifactId>OrientDB-core</artifactId>
<version>3.0.4</version>
</dependency>
<dependency>
<groupId>com.orientechnologies</groupId>
<artifactId>OrientDB-client</artifactId>
<version>3.0.4</version>
</dependency>

測試程式碼如下：

public class MatchTest {
public static void main(String[] args) {
// 使用者名稱和密碼，請根據配置修改
OrientGraphFactory factory = new OrientGraphFactory("remote:localhost/demodb", "root", "root");
OrientGraphNoTx graphNoTx = factory.getNoTx();
// 執行MATCH語句
Iterable<Element> iterable = graphNoTx.command(
new OCommandSQL(
"MATCH {as:c,class:Customers,where:(Phone='+1400844724')} RETURN c.Phone as phone,c.OrderedId as orderedId"
)).execute();
// 遍歷MATCH返回的結果集
Iterator<Element> it = iterable.iterator();
while (it.hasNext()) {
Element ele = it.next();
System.out.println("Phone=>" + ele.getProperty("phone") + ",OrderedId=>" + ele.getProperty("orderedId"));
}
graphNoTx.shutdown();
factory.close();
}
}

編寫MATCH語句的規則

4.1.確定查詢的起始點

圖查詢要從一個或者多個node開始，否則會引起某個class的全表掃描，甚至會引導整個圖的遍歷，這個開始的node就是查詢的起始點。全表掃描時的效能可能不如RDBMS的效能。

起始點要根據查詢需求來判斷及確定。一般可根據已知的查詢條件能夠最快確定的點就認為是起始點。如根據使用者的手機號”+1400844724”查詢獲取使用者的朋友，那麼根據使用者的手機號找到使用者的記錄，然後根據已經找到使用者的點再去遍歷獲取使用者的朋友，而不能根據朋友找使用者。

確定了查詢的起始點後，我們就可以編寫match語句了，但要讓查詢引擎按照我們的想法執行，需要注意一些編寫注意事項。

4.2.MATCH中必須要有一個class顯示的宣告的node

如下SQL語句沒有顯示的宣告class，執行後報” java.lang.UnsupportedOperationException”。

MATCH
{as:customer,where:(Phone='+1400844724')}-HasProfile->{as:profile}-HasFriend-{as:friend}
RETURN distinct customer,profile,friend

4.3.如果只有一個node宣告瞭class，那麼這個node就是起始點，無論這個node有沒有過濾條件

MATCH
{as:customer,where:(Phone='+1400844724')}-HasProfile->{as:profile}-HasFriend-{as:friend,class:Profiles}
RETURN distinct customer,profile,friend

explain結果如下：

分析：根據explain的結果可以知道起始點是Profiles，雖然Profiles沒有設定過濾條件，但只有它指定了class。

4.4.起始點必須要顯示宣告class

只有宣告瞭class才有可能作為起始點。

4.5.起始點的過濾條件儘量加索引

MATCH
{as:customer,class:Customers,where:(Phone='+1400844724')}-HasProfile->{as:profile,class:Profiles,where:(Id=2)}-HasFriend-{as:friend,class:Profiles,where:(Id=1)}
RETURN distinct customer,profile,friend

explain結果如下：

分析：根據上圖explain的結果可知道起始點是Profiles，且命中了索引Id=1和Id=9。Customers雖然宣告瞭class且新增了過濾條件，但並沒有把Customers作為起始點，因為Customers的Phone屬性上沒有索引。

請再explain如下SQL，觀察下explain的結果。

MATCH
{as:customer,class:Customers,where:(Phone='+1400844724')}-HasProfile->{as:profile,where:(Id=9)}-HasFriend-{as:friend,where:(Id=1)}
RETURN distinct customer,profile,friend

4.6.非起始點儘量不要宣告class，避免執行引擎把它識別成起始結點

4.7.儘量宣告邊的名稱和方向

在已知邊和邊的方向的情況下，明確宣告邊和邊的方向，這樣可以減少圖的遍歷路徑的數量。

5.1.MATCH返回的結果的去重處理

建立兩個點，並且在兩個點之間建立三條邊，建立語句如下：

insert into V set name = 'v1'
insert into V set name = 'v2'
create edge E from (select from V where name= 'v1') to (select from V where name= 'v2')
create edge E from (select from V where name= 'v1') to (select from V where name= 'v2')
create edge E from (select from V where name= 'v1') to (select from V where name= 'v2')

執行如下SQL語句：

MATCH
{as:v1,class:V,where:(name = 'v1')}--{as:v2}
RETURN v1,v2

在orientdb3.0.x的執行結果如下。

分析：返回了3條記錄，說明沒有去重。但是orientdb3.x支援distinct關鍵字，可通過如下語句去重。

MATCH
{as:v1,class:V,where:(name = 'v1')}--{as:v2}
RETURN distinct v1,v2

orientdb3.x這種設計更加合理，由使用者來自主選擇結果是否需要去重，而在orientdb2.x版本只返回了1條記錄，說明是自動去重的。在使用時需要注意，請自行驗證。

5.2.MATCH返回的結果的數量

MATCH返回的結果的數量是所有查詢路徑的數量。也可理解為根據所有起始結點查詢的笛卡兒積之和。

MATCH
{as:customer,class:Customers,where:(Phone in ['+1400844724','+1548604972'])}-HasProfile->{as:profile}-HasFriend->{as:friend1}
RETURN customer,profile,friend1

執行結果如下：

分析：起始結點為Customers，根據查詢條件最多可查詢到兩個Customers記錄。

根據’+1400844724’查詢路徑的數量為：1(customer的數量) * 1(HasProfile的數量) * 1 * 1(profile的數量) * 1(邊HasFriend的數量) * 2(friend1的數量) = 2

根據’+1548604972’查詢路徑的數量為：1(customer的數量) * 1(HasProfile的數量) * 1 * 1(profile的數量) * 1(邊HasFriend的數量) * 2(friend1的數量) = 2

所以所有路徑的數量4 = 2 + 2

基於MATCH的深度遍歷查詢有兩個辦法：第一個使用maxDepth，第二個是while和$depth變數。示例獲取Id為9的Profiles的深度為2的朋友關係。

5.3.1.使用maxDepth進行深度遍歷

MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,maxDepth:2,depthAlias:da}
RETURN friend.Id,da

執行結果如下：

分析：我們藉助orientdb3.x版本提供的depthAlilas特性獲取到friend的深度，方便我們理解。根據上圖的執行結果當maxDepth為2時，獲取的資料包括深度為0（查詢起始點）、1、2的資料。

深度為0是查詢起始結點，如何剔除深度為0的資料呢？有兩個辦法：

1）、使用MATCH和SELECT的組合

select
*
from (
MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,maxDepth:2,depthAlias:da}
RETURNfriend.Id,da
)
where da > 0

2）、使用$depth進行過濾

MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,maxDepth:2,where:($depth > 0),depthAlias:da}
RETURN friend.Id,da

限於篇幅請自行驗證結果。

5.3.2.使用while和$depth進行深度遍歷

MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,while:($depth < 2),depthAlias:da}
RETURN friend.Id,da

分析：根據上圖的執行結果當$depth<2時，獲取的資料包括深度為0（查詢起始點）、1、2的資料，注意這裡包括深度為2的資料。

深度為0是查詢起始結點，如何剔除深度為0的資料呢？

有兩個辦法：

1）、使用MATCH和SELECT的組合

select
*
from (
MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,while:($depth < 2),depthAlias:da}
RETURNfriend.Id,da
)
where da > 0

2）、使用$depth進行過濾

MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,while:(depth < 2),where:(depth > 0),depthAlias:da}
RETURN friend.Id,da

限於篇幅大家自行驗證結果。

5.4.RETURN上下文變數的使用

請執行如下SQL結合MATCH語法描述部分理解下這幾個變數不同。限於篇幅，請自行驗證結果。

MATCH
{as:customer,class:Customers,where:(Phone = '+1400844724')}.outE(){}
RETURN $matches
MATCH
{as:customer,class:Customers,where:(Phone = '+1400844724')}.outE(){}
RETURN $paths
MATCH
{as:customer,class:Customers,where:(Phone = '+1400844724')}.outE(){}
RETURN $elements
MATCH
{as:customer,class:Customers,where:(Phone = '+1400844724')}.outE(){}
RETURN $pathElements

5.5.使用count(*)而不是count(1)

在關係型資料庫中我們建議使用count(1)統計數量，但在orientdb中我們建議使用count(*)而不是count(1)。具體原因，我們通過explain如下SQL來分析下。

select count(*) from Profiles

explain結果如下：

select count(1) from Profiles

explain結果如下：

分析：根據explain的結果：count()是直接取class的記錄大小，而count(1)會取出所有class下cluster的記錄，然後計算大小，所以count()的效能肯定比count(1)快很多。但由於本例中Profiles的數量比較小，效能上看不出大的差別，倘若數量大，效能會有明顯的差別。可自行驗證資料量比較大的點。

5.6.分組查詢

統計Id為9的朋友一度和二度朋友的數量。

MATCH
{as:profile,class:Profiles,where:(Id = 9)}-HasFriend->{as:friend,while:($depth < 2),where:($depth > 0),depthAlias:da}
RETURN da,count(*)
GROUP BY da
ORDER BY da desc

使用了orientdb3.x提供的group by和order by功能。

5.7.分頁查詢

分頁查詢friend，獲取第9頁，每頁10條記錄。

MATCH
{as:profile,class:Profiles}
RETURN profile
SKIP 90
LIMIT 100

explain結果如下：

分析：分頁查詢需要skip和limit一起使用，其思路和mysql的limit分頁是一致的。需要查詢出前limit條，然後通過skip跳過來分頁，當資料量大且查詢頁數越大時查詢效能越慢。使用時請慎重使用，可考慮基於索引限制條件來分頁。

5.8.拆分SQL語句

假如有些場景的查詢需要一個點與三條以及上的邊關聯，那麼如何寫這個SQL呢？按照我們目前理解的MATCH寫法一個node只能左邊關聯一個node，右邊關聯一個node。這個地方就需要拆分 SQL語句。

找一個客戶，這個客戶既有吃過某個酒店，也居住過某個酒店，也訪問過某些旅遊景點。SQL如下：

MATCH
{as:customer,class:Customers,where:(Phone = '+1400844724')}.out('HasVisited'), {as:customer}.out('HasStayed'),
{as:customer}.out('HasEaten')
RETURN distinct customer

5.9.實現LEFT JOIN的功能

查詢出所有Customers，要求Customers必須要有Friend，SQL如下：

MATCH
{as:customer,class:Customers}-HasProfile->{}-HasFriend-{as:friend}
RETURN customer,friend

那麼如果查詢出所有Customers同時帶出它們的Friend，即使沒有Friend的Customers也要查詢出來。那麼該如何寫呢?

MATCH
{as:customer,class:Customers}-HasProfile->{}-HasFriend-{as:friend,optional:true} RETURN customer,friend

5.10.實現INNER JOIN的功能

查詢出所有Customers，要求它的Name和它的朋友的Name相同。

MATCH
{as:customer,class:Customers}-HasProfile->{as:profile}-HasFriend-{as:friend,where:($matched.profile.Name = Name)}
RETURN distinct customer,profile,friend

分析：藉助$matched變數引用另外一個點的別名，然後通過別名訪問相關屬性。這個示例中的資料顯示它自己是自己的朋友，這個僅說明如何使用，不用太關心具體的業務資料。

5.11.已知RID查詢

已知Customers的rid為#121:0，查詢出它的朋友。

MATCH
{as:customer,rid:#121:0}-HasProfile->{as:profile}-HasFriend-{as:friend}
RETURN distinct customer,profile,friend

分析：這個特性orientdb官方並沒有暴露出來，雖然目前試驗下來orientdb2.x和orientdb3.x都支援，但請慎重使用，說不定下個版本應當不支援了。

5.12.基於邊上的條件查詢

查詢在2018-10-17這個日期成為朋友的使用者和朋友。

由於邊HasFriend上的屬性From和SQL關鍵字衝突，無法根據此屬性查詢，所以我們需要新建個屬性，執行SQL:

update edge HasFriend set since = '2018-10-17' limit 1

然後執行如下SQL:

MATCH
{as:customer,class:Customers}-HasProfile->{as:profile}.bothE('HasFriend'){where:(since = '2018-10-17')}.bothV(){as:friend}
RETURN distinct customer,profile,friend

5.13.如何避免查詢環

查詢所有Customers的朋友的朋友。

MATCH
{as:customer,class:Customers}-HasProfile->{as:profile}-HasFriend-{}-HasFriend-{as:friend2}
RETURN distinct customer,profile,friend2 limit 10

通過上圖查詢結果高亮部分我們可以知道用的朋友的朋友是它自己，形成了一個查詢環，這樣的資料應該剔除，那麼如何剔除呢？需要藉助兩個變數$matched和$currentMatch，$matched前邊例子已經介紹過，要想在{}點內訪問另外一個{}點，必須藉助$matched。不過訪問當前{}點也是不能直接使用當前{}點定義的別名的，需要藉助$currentMatch。

MATCH
{as:customer,class:Customers}-HasProfile->{as:profile}-HasFriend-{}-HasFriend-{as:friend2,where:($currentMatch != $matched.profile)}
RETURN distinct customer,profile,friend2 limit 10

本作品採用《CC 協議》，轉載必須註明作者和本文連結

你還差得遠吶！

orientDB學習筆記（一）六度分隔理論
2021-12-17
筆記
orientDB學習筆記（三）資料庫構架設計
2022-01-13
筆記資料庫
React 學習筆記【二】
2019-01-08
React筆記
TensorFlow學習筆記（二）
2019-04-11
筆記
vue學習筆記二
2018-12-21
Vue筆記
goLang學習筆記（二）
2018-08-15
Golang筆記
ANFIS學習筆記（二）
2020-03-23
筆記
activiti學習筆記二
2020-06-26
筆記
Typescript學習筆記（二）
2019-04-25
TypeScript筆記
python學習筆記（二）
2019-02-21
Python筆記
TS學習筆記（二）
2024-10-03
筆記
JavaScript學習筆記（二）
2021-09-09
JavaScript筆記
Hibernate學習筆記二
2021-06-29
筆記
Vue學習筆記（二）------axios學習
2019-04-20
Vue筆記iOS
Java學習筆記記錄（二）
2018-12-27
Java筆記
高等數學學習筆記（二）
2024-08-22
筆記
深度學習 DEEP LEARNING 學習筆記（二）
2020-07-24
深度學習筆記
Spring MVC學習筆記二
2024-03-26
SpringMVC筆記
TS學習筆記（二）：介面
2019-04-20
筆記
github--學習筆記（二）
2019-02-20
Github筆記
react native學習筆記（二）
2018-04-03
React Native筆記
智慧窗-學習筆記（二）
2020-12-06
筆記
[寒假學習筆記]（二）Python初學
2019-01-20
筆記Python
HTML入門學習筆記（二）
2019-02-16
HTML筆記
Kafka 學習筆記（二）：初探 Kafka
2019-03-04
Kafka筆記
JDBC與JavaBean學習筆記(二)
2018-05-12
JDBCJavaBean筆記
架構學習筆記系列二
2018-06-25
架構筆記
ES6 學習筆記二
2019-11-04
筆記
javascript學習筆記，二、變數
2020-11-18
JavaScript筆記變數
MySQL高階學習筆記(二)
2020-11-13
MySql筆記
二項式反演學習筆記
2024-09-11
筆記
Kafka學習筆記（二）：初探Kafka
2018-03-26
Kafka筆記
Laravel 學習筆記二: Blade模板
2022-07-31
Laravel筆記
wqs二分學習筆記
2022-07-12
筆記
XXL-JOB學習筆記（二）
2020-11-24
筆記
二叉樹學習筆記
2020-11-27
二叉樹筆記
python爬蟲學習筆記（二）
2020-11-24
Python爬蟲筆記
線性代數學習筆記（二）+貪心學習筆記（一）（2024.10.5）
2024-10-05
筆記

orientDB學習筆記（二）MATCH

3.1.在browse控制檯中使用

3.2.在graph控制檯中使用

3.3.使用API

4.1.確定查詢的起始點

4.2.MATCH中必須要有一個class顯示的宣告的node

4.3.如果只有一個node宣告瞭class，那麼這個node就是起始點，無論這個node有沒有過濾條件

4.4.起始點必須要顯示宣告class

4.5.起始點的過濾條件儘量加索引

4.7.儘量宣告邊的名稱和方向

5.1.MATCH返回的結果的去重處理

5.2.MATCH返回的結果的數量

5.3.1.使用maxDepth進行深度遍歷

5.3.2.使用while和$depth進行深度遍歷

5.4.RETURN上下文變數的使用

5.5.使用count(*)而不是count(1)

5.6.分組查詢

5.7.分頁查詢

5.8.拆分SQL語句

5.9.實現LEFT JOIN的功能

5.10.實現INNER JOIN的功能

5.11.已知RID查詢

5.12.基於邊上的條件查詢

5.13.如何避免查詢環

相關文章