資料模式(Schema)定義

xbynet發表於2016-10-25

原文網址 : https://blog.csdn.net/xbynet/article/details/52918731

資料模式(Schema)定義

Created 星期一 24 十月 2016

模式（Schema）定義了一個多維資料庫，它包一個含邏輯模型，並定義了邏輯模型到物理模型的對映。
物理模型包含多維資料的儲存方式：事實表、維表及其結構等。
邏輯模型包含了用於MDX 查詢的結構，如資料立方體、維度、層次、成員、度量等。
Mondrian 使用一個XML 檔案來描述模式。
Mondrian 包含一個建立、修改模式檔案的Java桌面應用schema-workbench。

模式檔案

Schema 檔案可以定義虛擬立方體，基於一個或多個基礎立方體來建立。也可以定義角色以及角色的訪問許可權（對立方體、維度等）。有關虛擬立方體和角色的內容本文件不涉及。
Schema 檔案主要結構如下（去掉了VirtualCube 和Role 等）：
Schema 包含立方體、虛擬立方體、共享維度、角色

Cube 維度和度量的集合，以事實表為中心

Table 事實表

AggName 宣告一個聚集表

aggElements 聚集表設定，結構見下

AggPattern 宣告一批聚集表

aggElements

Dimension 維度

Hierarchy 維度層次

relation 物理表、檢視，結構見下
Closure/ 對映父子層次關係
Level 一個層次

DimensionUsage 引用共享維度
Measure 度量

CalculatedMemberProperty/

CalculatedMember 計算成員
NamedSet 命名集合

Formula/ 公式

UserDefinedFunction/ 宣告使用者自定義函式

Relation 可以是Table、View、Inline Table、Join：
aggElement 可以是以下元素：

AggExclude
AggFactCount
AggIgnoreColumn
AggForeignKey
AggMeasure
AggLevel

示例：
personDemo.xml
<?xml version="1.0" encoding="UTF-8"?>
<Schema name="Mondrian"> 
<Cube name="Person"> 

<Table name="PERSON" /> 
<Dimension name="部門" foreignKey="USERID" >

<Hierarchy hasAll="true" primaryKey="USERID" allMemberName="所有部門" > 
<Table name="PERSON" alias="a"/> 
<Level name="部門" column="DEPARTMENT" uniqueMembers="true" /> 
<Level name="姓名" column="USERNAME" uniqueMembers="true" />
</Hierarchy>

</Dimension>
<Dimension name="性別" foreignKey="USERID" >

</Dimension>
<Dimension name="專業技術資格類別" foreignKey="USERID" >

</Dimension>
<Dimension name="專業技術資格等級" foreignKey="USERID" >

</Dimension>
<Dimension name="職系" foreignKey="USERID" >

</Dimension>
<Dimension name="民族" foreignKey="USERID" >

</Dimension>
<Dimension name="學歷" foreignKey="USERID" >

</Dimension>
<Measure name="人數" column="USERID" aggregator="distinct count" /> 
</Cube>

</Schema>

度量

每個度量都有一個名字，一個事實表的列，一個聚集器（aggregator）。聚集起通常為sum，但count，min，max，avg，distinct-count 都是可以的；

可選的datatype 屬性可以是String、Integer、Numeric、Boolean、Date、Time 以及Timestamp，預設是Numeric，除了是count 或distinct-count，，這兩個預設是Integer。
可選的formatString 屬性指定怎樣列印這個值。
度量可以有一個caption（標題）屬性以便Member.getCaption()方法返回它而不是返回名字。

度量除了可以來自一個列，也可以使用一條SQL 表示式來計算。下面Promotion Sales 就是
一個例子：
<Measure name="Promotion Sales" aggregator="sum" formatString="#,###.00">

(case when sales_fact_1997.promotion_id =0 then 0 else
sales_fact_1997.store_sales end)

</SQL>

</MeasureExpression>

</Measure>
為了一致，度量被視為一個特別的維度的成員，這個維度叫做Measures。

預設度量

Cube（和VirtualCube）元素允許指定可選的屬性defaultMeasure。
<Cube name="Sales" defaultMeasure="Unit Sales">
...

維度，層次

表對映

Dimension 元素有一個foreignKey 屬性，它是事實表的列名；
而Hierarchy 元素有一個primaryKey 屬性。
如果一個層次有不止一個表，可以使用primaryKeyTable 屬性來區分。
Level 的column 定義了它的鍵，必須是這個層次所在表的列名。如果鍵是一個表示式，可以使用KeyExpression 元素子元素。
Level 元素的uniqueMembers 屬性用於優化SQL 生成。

成員"ALL"
<Hierarchy hasAll="true" ...allMemberName="..."

多層次體系
一個維度可以由多個層次組成：
<Dimension name="Time" foreignKey="time_id">
<Hierarchy hasAll="false" primaryKey="time_id">
<Table name="time_by_day"/>
<Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
<Level name="Quarter" column="quarter" uniqueMembers="false"/>
<Level name="Month" column="month_of_year" type="Numeric"
uniqueMembers="false"/>
</Hierarchy>
<Hierarchy name="Time Weekly" hasAll="false" primaryKey="time_id">
<Table name="time_by_week"/>
<Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
<Level name="Week" column="week" uniqueMembers="false"/>
<Level name="Day" column="day_of_week" type="String"
uniqueMembers="false"/>
</Hierarchy>
</Dimension>
除了它們都連線事實表的同一列"time_id"以外，這些維度層次沒有多少共同的地方，它們甚至沒有使用同一個表。
把兩個層次放到一個維度中的主要原因是，這對終端使用者來說有更多意義。終端使用者知道如果把"Time"層次放到一個軸上而把"Time Weekly"放到另一個軸上是沒有意義的。如果兩個層次是同一個維度，MDX 不會允許它們用於同一個查詢中。

退化維度

退化維度是這樣的一種維度：由於它過於簡單而不值得為它建立一個維表。考慮下面的事實表：
產品時間支付方式客戶數量金額
55 20040106 Credit 123 3 3.54
78 20040106 Cash 89 1 20.00
199 20040107 ATM 3 2 2.99
55 20040106 Cash 122 1 1.18
假設我們為支付方式列的值建立一個維表：
payment_method
Credit
Cash
ATM
這個維表沒什麼意義，它只有三個取值，沒有額外的資訊，併產生了額外的連線開銷。你可以建立一個退化維度，只要宣告一個維度卻不指定表，Mondrian 會認為這些列是來自事實表。
<Cube name="Checkout">
<Table name="checkout">
<Dimension name="Payment method">
<Hierarchy hasAll="true">

<Level name="Payment method" column="payment_method"
uniqueMembers="true"/>
</Hierarchy>
</Dimension>

</Cube>
注意由於沒有連線，Dimension 的foreignKey 屬性是不必要的，並且Hierarchy 元素沒有Table 子元素或primaryKey 屬性。

內嵌表

http://mondrian.pentaho.com/documentation/schema.php#Inline_tables
InlineTable 構件允許你在模式檔案定義一個資料集。你必須要宣告列名，列型別（String 或Numeber），以及一個行集。像Table 和View 一樣，你必須提供一個唯一別名以引用這個資料集。這裡是一個例子（定義了嚴重性級別維度）：
<Dimension name="Severity">
<Hierarchy hasAll="true" primaryKey="severity_id">
<InlineTable alias="severity">
<ColumnDefs>
<ColumnDef name="id" type="Numeric"/>
<ColumnDef name="desc" type="String"/>
</ColumnDefs>
<Rows>
<Row>
<Value column="id">1</Value>
<Value column="desc">High</Value>
</Row>
<Row>
<Value column="id">2</Value>
<Value column="desc">Medium</Value>
</Row>
<Row>
<Value column="id">3</Value>
<Value column="desc">Low</Value>
</Row>
</Rows>
</InlineTable>
<Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
這跟你的資料庫有一個名為severity 的表以及下面的宣告是一樣的效果。
Severity 表：
id desc
1 High
2 Medium
3 Low

成員屬性

成員屬性通過Level 內的Property 元素來定義，例如：
<Level name="MyLevel" column="LevelColumn" uniqueMembers="true">

<Property name="MyProp" column="PropColumn"

formatter="com.example.MyPropertyFormatter"/>
<Level/>
Property 的屬性formatter 定義了一個屬性格式化器，它修改了成員的
getPropertyFormattedValue()的預設行為。格式化器要實現mondrian.spi.PropertyFormatter介面。
一旦屬性在模式中定義了以後，就可以在MDX 語句中使用它，通過member.Properties("屬性名")函式。例如：
SELECT
{[Store Sales]} ON COLUMNS,
TopCount(Filter([Store].[Store Name].Members,
[Store].CurrentMember.Properties("Store Type") = "Supermarket"),
10, [Store Sales]) ON ROWS
FROM [Sales]
先篩選型別為"Supermarket"的商場（Filter 函式），再選出銷售額最大的十家（TopCount 函式），把它們放在行上。

計算成員

假設你想建立一個度量，它的值不是來自事實表的列，而是來自一個MDX 公式
有兩種方式。
一種方式使用with member 字句，像這樣：
WITH MEMBER [Measures].[Profit] AS '[Measures].[Store Sales]-[Measures].[Store Cost]',
FORMAT_STRING = '$#,###'
SELECT {[Measures].[Store Sales], [Measures].[Profit]} ON COLUMNS,
{[Product].Children} ON ROWS
FROM [Sales]
WHERE [Time].[1997]
但比起在每個MDX 查詢中包含這個子句，更好的辦法是，你可以把這個成員定義到模式中，作為立方體定義的一部分：
<CalculatedMember name="Profit" dimension="Measures">
<Formula>[Measures].[Store Sales] - [Measures].[Store Cost]</Formula>
<CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
</CalculatedMember>
你可以給計算成員指定SOLVE_ORDER 屬性，用於確定計算時的優先順序：
<CalculatedMemberProperty name="SOLVE_ORDER" value="2000"/>

命名集合

<Cube name="Warehouse">
...
<NamedSet name="Top Sellers">
<Formula>
TopCount([Warehouse].[Warehouse Name].MEMBERS, 5,
[Measures].[Warehouse Sales])
</Formula>
</NamedSet>
</Cube>
在MDX 中使用：
SELECT
{[Measures].[Warehouse Sales]} ON COLUMNS,
{[Top Sellers]} ON ROWS
FROM [Warehouse]
WHERE [Time].[Year].[1997]

星型和雪花型

前面的立方體都是基於一個事實表、事實表上的維度以及連線到事實表的維表構建。這是最普通的對映方式，稱為星型模式（Star Schema）。
一個維度可以基於不止一個維表，只要這些表能有良好定義的路徑連線到事實表。這樣定義維度的方式就是一個雪花型模式（Snowflake Schema）。這些維表通過Join 操作來定義。例如：
<Cube name="Sales">
...
<Dimension name="Product" foreignKey="product_id">
<Hierarchy hasAll="true" primaryKey="product_id" primaryKeyTable="product">
<Join leftKey="product_class_key" rightAlias="product_class"
rightKey="product_class_id">

</Join>
</Join>
</Hierarchy>
</Dimension>
</Cube>
上面定義了一個由三個表組成的Product 維度。事實表連線到product（通過外來鍵
product_id），而product 連線到product_class（通過外來鍵product_class_id），而
product_class 連線到product_type（通過外來鍵product_type_id）。我們需要一個Join 元素內巢狀一個Join 元素，因為Join 接收兩個運算元。運算元可以是表、連線（Join）甚至查詢。
注意這裡Join 元素有一個rightAlias 屬性。這是必要的，因為Join 的右邊元件（巢狀的Join元素）由不止一個表組成。這種情形下leftAlias 屬性是不必要的，因為leftKey 來自product表，沒有歧義。

共享維度

當一個維度（維表）被多個立方體使用時，可以把它定義為共享維度（shared dimensions），在立方體內引用這個維度即可。因為一個共享維度不屬於一個立方體，你必須給它一個明確的表（或其他資料來源）。當你在某個立方體中使用它時，指定維度的外來鍵。
<Dimension name="Store Type">
<Hierarchy hasAll="true" primaryKey="store_id">
<Table name="store"/>
<Level name="Store Type" column="store_type" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Cube name="Sales">
<Table name="sales_fact_1997"/>
...
<DimensionUsage name="Store Type" source="Store Type" foreignKey="store_id"/>
</Cube>

閉包表

http://mondrian.pentaho.com/documentation/schema.php#Closure_tables
舉例來說，閉包表（closure table）就是包含所有僱員/主管關係記錄（不管深度如何）的SQL 表。

聚集表

http://mondrian.pentaho.com/documentation/schema.php#Aggregate_tables
設想一個CEO要執行一個銷售報表，這個報表只包含一個數值：今年所有產品在所有地區的銷售總額。為了獲得這個數值，Mondrian產生像這樣的一條SQL語句：
SELECT sum(store_sales)
FROM sales_fact, time
WHERE sales_fact.time_id = time.time_id AND time.year = 2005
併傳送給資料庫，而資料庫要花好幾分鐘去執行它。這也容易理解，因為資料庫要讀取事實表中所有今年的記錄（比方說，幾百萬條銷售資料）然後彙總成一個彙總數。很明顯，在這裡，以及其他類似的時候，所需要的只是一個預先計算的概括性資料：聚集表。
聚集表跟基礎事實表同時共存，包含著從事實表建立的預先聚集的度量。它在Mondrian的模式檔案中註冊，因此在一個特定的查詢中能應用得上的時候，Mondrian 就能夠選擇某個聚集表而不是事實表。

Mondrian 有一個工具AggGen (aggregate generator)用於輔助設計和維護聚集表。它可以生成建立聚集表的SQL 語句，它也可以根據一條MDX 語句，給出能夠優化這個查詢的建立/插入聚集表的SQL 語句。

</AggName>

</Table>

</Cube>

XML Schema定義
2019-01-30
XML
《甘十九妹》 ent 實現之 Schema 定義
2020-03-29
用IBM Data Studio定義Schema Annotations標記BM
2022-03-21
IBM
設計模式定義
2024-06-01
設計模式
持續定義 Saas 模式雲資料倉儲+實時搜尋
2020-10-30
模式
XML Schema 複雜元素型別詳解：定義及示例解析
2024-05-18
XML型別
Salt Highstate資料結構定義
2019-08-14
資料結構
資料庫-Duckdb-catalog和schema
2024-06-11
資料庫
Mybatis實現分包定義資料庫
2022-01-09
MyBatis資料庫
XML Schema（XSD）詳解：定義 XML 文件結構合法性的完整指南
2024-05-09
XML
介面卡和策略模式的定義
2021-09-09
模式
db2匯出資料庫定義及遷移資料
2022-04-27
DB2資料庫
flask資料庫遷移 No changes in schema detected.
2018-07-17
Flask資料庫
mysql的 information_schema 資料庫介紹
2024-11-06
MySqlORM資料庫
分散式資料庫的定義和特點
2023-10-26
分散式資料庫
JAVASE——資料型別，變數定義及使用
2020-08-05
Java資料型別變數
稀疏感知&稀疏預定義資料排程器
2024-11-04
RadioGroup 在TS schema 中的設定
2024-05-25
資料庫架構和物件、定義資料完整性-SQL Server
2021-09-09
資料庫架構物件SQLServer
策略模式-定義一個演算法族
2020-12-28
模式演算法
衡量資料管理價值的指標如何定義
2024-03-04
指標
資料孤島的定義，企業如何應對？
2024-02-04
MySQL預設資料庫之performance_schema庫
2019-07-21
MySql資料庫ORM
MySQL預設資料庫之 information_schema庫
2019-07-21
MySql資料庫ORM
設計模式的數學定義是什麼？
2018-11-24
設計模式
定義多維的點模板類,任意資料型別
2018-08-10
資料型別
如何在MySQL資料庫中定義外來鍵ZM
2022-03-21
MySql資料庫
越來越簡單的資料類定義：named tuple
2022-01-15
Python Schema一種優雅的資料驗證方式
2019-02-16
Python
XML Schema 字串資料型別及約束詳解
2024-05-24
XML字串資料型別
資料庫安全定義以及重要性簡單講解
2023-10-26
資料庫
資料視覺化初學者指南：定義、示例和工具
2019-09-25
視覺化
[譯]LINQ TO SQL 介紹(定義資料模型類) – Part.2
2021-09-09
SQL模型
在 Apache Cassandra 中定義和最佳化資料分割槽
2021-10-22
Apache
MYSQL學習與實驗（一）——資料庫定義與操作
2020-12-05
MySql資料庫
Mysql資料庫自定義函式的定義、使用方法及操作注意事項
2019-03-07
MySql資料庫函式
多維資料分析模型設計器Schema Workbench的使用
2018-10-26
模型
高效能MySQL (一)：Schema與資料型別優化
2019-01-23
MySql資料型別優化
首先將performance_schema這個資料庫下的需要
2022-01-09
ORM資料庫

資料模式(Schema)定義

相關文章