Presto是什麼
Presto 是一個分散式 SQL 查詢引擎,旨在查詢大型資料集 分佈在一個或多個異構資料來源上。筆者所參與的專案主要使用Presto做資料探查和資料分析。
Presto架構
Presto查詢引擎是一個Master-Slave的架構,由一個Coordinator節點,一個Discovery Server節點,多個Worker節點組成,Discovery Server通常內嵌於Coordinator節點中。
Coordinator負責解析SQL語句,生成執行計劃,分發執行任務給Worker節點執行。
Worker節點負責實際執行查詢任務。Worker節點啟動後向Discovery Server服務註冊,Coordinator從Discovery Server獲得可以正常工作的Worker節點。
Presto是功能如何實現
Presto工程中使用Connector負責Presto與資料來源進行互動,不同的資料庫對應於不同的Connector。Connector是使用 SPI 作為服務提供/發現機制的。Java中SPI機制主要思想是將裝配的控制權移到程式之外。
在Presto中的應用就是基於 com.facebook.presto.spi.Plugin 這個介面實現一個對應的xxxPlugin。並且在plugin.dir配置的路徑中新增實現com.facebook.presto.spi.Plugin介面程式碼對應的產物,即可在Presto中實現與資料來源的互動。
高斯資料庫GuassPlugin外掛開發
軟體版本資訊
· presto原始碼版本0.253
· jdk1.8-151(和presto原始碼版本對應)
具體實現步驟
Presto原始專案中建立presto-gauss專案
presto-gauss pom中需要新增com.facebook.presto作為父專案,同時需要新增Presto官方實現的JdbcPlugin依賴包,以及高斯資料庫JDBC包。
將打包專案打包方式設定成presto-plugin。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>presto-root</artifactId>
<groupId>com.facebook.presto</groupId>
<version>0.253</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>presto-gauss</artifactId>
<description>Presto - gauss Connector</description>
<packaging>presto-plugin</packaging>
<dependencies>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-base-jdbc</artifactId>
</dependency>
<dependency>
<groupId>com.facebook.airlift</groupId>
<artifactId>configuration</artifactId>
</dependency>
<dependency>
<groupId>com.facebook.airlift</groupId>
<artifactId>log-manager</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
</dependency>
<dependency>
<groupId>javax.inject</groupId>
<artifactId>javax.inject</artifactId>
</dependency>
<dependency>
<groupId>javax.validation</groupId>
<artifactId>validation-api</artifactId>
</dependency>
<!-- Presto SPI -->
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-spi</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-common</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.facebook.drift</groupId>
<artifactId>drift-api</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>slice</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>units</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<scope>provided</scope>
</dependency>
<!-- 華為高斯依賴 -->
<dependency>
<groupId>com.huawei</groupId>
<artifactId>gsjdbc200</artifactId>
<version>8.1.3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<!-- TODO: Remove this once fixed -->
<ignoredDependencies>
</ignoredDependencies>
</configuration>
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>ci</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<excludes combine.self="override"/>
</configuration>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>default</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
在presto-root專案的pom中增加module
<modules>
……
<module>presto-gauss</module>
……
</modules>
在presto-server專案中增加presto-gauss依賴
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-gauss</artifactId>
<version>${project.version}</version>
<type>zip</type>
<scope>provided</scope>
</dependency>
基於高斯資料庫驅動實現ConnectionFactory例項的建立
因為高斯資料庫是JDBC 資料來源官方的JdbcPlugin中已經實現com.facebook.presto.spi.Plugin中的介面,因此我們僅需高斯Jdbc驅動例項構建ConnectionFactory即可。
高斯資料庫驅動對於JDBC的支援程度、資料來源本身的特性,所以需要對原始JDBC進行一些適配。
建立GaussClient.java實現ConnectionFactory,同時完成表重新命名、資料型別對映、查詢表和模式的等方法進行適配。
package com.facebook.presto.plugin.gauss;
import com.facebook.presto.common.type.Decimals;
import com.facebook.presto.common.type.Type;
import com.facebook.presto.common.type.VarcharType;
import com.facebook.presto.plugin.jdbc.*;
import com.facebook.presto.spi.ConnectorSession;
import com.facebook.presto.spi.PrestoException;
import com.facebook.presto.spi.SchemaTableName;
import com.huawei.gauss200.jdbc.Driver;
import javax.inject.Inject;
import java.sql.*;
import java.util.Optional;
import java.util.Properties;
import static com.facebook.presto.common.type.DecimalType.createDecimalType;
import static com.facebook.presto.common.type.VarbinaryType.VARBINARY;
import static com.facebook.presto.common.type.VarcharType.createUnboundedVarcharType;
import static com.facebook.presto.common.type.VarcharType.createVarcharType;
import static com.facebook.presto.plugin.jdbc.DriverConnectionFactory.basicConnectionProperties;
import static com.facebook.presto.plugin.jdbc.JdbcErrorCode.JDBC_ERROR;
import static com.facebook.presto.plugin.jdbc.StandardReadMappings.*;
import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;
import static java.lang.String.format;
import static java.util.Locale.ENGLISH;
import static java.util.Objects.requireNonNull;
public class GaussClient extends BaseJdbcClient
{
private static final int FETCH_SIZE = 1000;
private final boolean synonymsEnabled;
private final int numberDefaultScale;
@Inject
public GaussClient(JdbcConnectorId connectorId, BaseJdbcConfig config, GaussConfig gaussConfig) throws SQLException {
super(connectorId, config, "", connectionFactory(config, gaussConfig));
requireNonNull(gaussConfig, "gauss config is null");
this.synonymsEnabled = gaussConfig.isSynonymsEnabled();
this.numberDefaultScale = gaussConfig.getNumberDefaultScale();
}
private static ConnectionFactory connectionFactory(BaseJdbcConfig config, GaussConfig gaussConfig) {
Properties connectionProperties = basicConnectionProperties(config);
connectionProperties.setProperty("useUnicode", "true");
connectionProperties.setProperty("characterEncoding", "utf8");
connectionProperties.setProperty("ssl", "false");
return new DriverConnectionFactory(new Driver(), config.getConnectionUrl(),
Optional.ofNullable(config.getUserCredentialName()),
Optional.ofNullable(config.getPasswordCredentialName()), connectionProperties);
}
private String[] getTableTypes()
{
if (synonymsEnabled) {
return new String[] {"TABLE", "VIEW", "SYNONYM"};
}
return new String[] {"TABLE", "VIEW"};
}
@Override
protected ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)
throws SQLException
{
DatabaseMetaData metadata = connection.getMetaData();
String escape = metadata.getSearchStringEscape();
return metadata.getTables(
connection.getCatalog(),
escapeNamePattern(schemaName, Optional.of(escape)).orElse(null),
escapeNamePattern(tableName, Optional.of(escape)).orElse(null),
getTableTypes());
}
@Override
public PreparedStatement getPreparedStatement(Connection connection, String sql)
throws SQLException
{
PreparedStatement statement = connection.prepareStatement(sql);
statement.setFetchSize(FETCH_SIZE);
return statement;
}
@Override
protected String generateTemporaryTableName()
{
return "presto_tmp_" + System.nanoTime();
}
@Override
protected void renameTable(JdbcIdentity identity, String catalogName, SchemaTableName oldTable, SchemaTableName newTable)
{
if (!oldTable.getSchemaName().equalsIgnoreCase(newTable.getSchemaName())) {
throw new PrestoException(NOT_SUPPORTED, "Table rename across schemas is not supported in gauss");
}
String newTableName = newTable.getTableName().toUpperCase(ENGLISH);
String oldTableName = oldTable.getTableName().toUpperCase(ENGLISH);
String sql = format(
"ALTER TABLE %s RENAME TO %s",
quoted(catalogName, oldTable.getSchemaName(), oldTableName),
quoted(newTableName));
try (Connection connection = connectionFactory.openConnection(identity)) {
execute(connection, sql);
}
catch (SQLException e) {
throw new PrestoException(JDBC_ERROR, e);
}
}
@Override
public Optional<ReadMapping> toPrestoType(ConnectorSession session, JdbcTypeHandle typeHandle)
{
int columnSize = typeHandle.getColumnSize();
switch (typeHandle.getJdbcType()) {
case Types.CLOB:
case Types.NCLOB:
return Optional.of(varcharReadMapping(createUnboundedVarcharType()));
case Types.SMALLINT:
return Optional.of(smallintReadMapping());
case Types.FLOAT:
case Types.DOUBLE:
return Optional.of(doubleReadMapping());
case Types.REAL:
return Optional.of(realReadMapping());
case Types.NUMERIC:
int precision = columnSize == 0 ? Decimals.MAX_PRECISION : columnSize;
int scale = typeHandle.getDecimalDigits();
if (scale == 0) {
return Optional.of(bigintReadMapping());
}
if (scale < 0 || scale > precision) {
return Optional.of(decimalReadMapping(createDecimalType(precision, numberDefaultScale)));
}
return Optional.of(decimalReadMapping(createDecimalType(precision, scale)));
case Types.VARCHAR:
case Types.NVARCHAR:
case Types.LONGVARCHAR:
case Types.LONGNVARCHAR:
if (columnSize > VarcharType.MAX_LENGTH) {
return Optional.of(varcharReadMapping(createUnboundedVarcharType()));
}
return Optional.of(varcharReadMapping(createVarcharType(columnSize)));
case Types.BLOB:
return Optional.of(varbinaryReadMapping());
}
return super.toPrestoType(session, typeHandle);
}
@Override
protected String toSqlType(Type type) {
if (VARBINARY.equals(type)) {
return "blob";
}
return super.toSqlType(type);
}
}
基於presto使用的Guice依賴注入框架,建立GuassPlugin.java、GaussClientModule.java、GaussConfig.java 類註冊實現的外掛
GuassPlugin.java:
package com.facebook.presto.plugin.gauss;
import com.facebook.presto.plugin.jdbc.JdbcPlugin;
/**
* Initialize GuassPlugin class for prestoDB
*/
public class GuassPlugin
extends JdbcPlugin
{
/**
* gauss Plugin Constructor
*/
public GuassPlugin()
{
super("gauss", new GaussClientModule());
}
}
GaussClientModule.java:
package com.facebook.presto.plugin.gauss;
import com.facebook.presto.plugin.jdbc.BaseJdbcConfig;
import com.facebook.presto.plugin.jdbc.JdbcClient;
import com.google.inject.Binder;
import com.google.inject.Module;
import com.google.inject.Scopes;
import static com.facebook.airlift.configuration.ConfigBinder.configBinder;
public class GaussClientModule
implements Module
{
@Override
public void configure(Binder binder)
{
binder.bind(JdbcClient.class).to(GaussClient.class)
.in(Scopes.SINGLETON);
configBinder(binder).bindConfig(BaseJdbcConfig.class);
configBinder(binder).bindConfig(GaussConfig.class);
}
}
GaussClientModule.java:
package com.facebook.presto.plugin.gauss;
import com.facebook.airlift.configuration.Config;
import javax.validation.constraints.Max;
import javax.validation.constraints.Min;
import javax.validation.constraints.NotNull;
import java.math.RoundingMode;
public class GaussConfig
{
private boolean synonymsEnabled;
private int varcharMaxSize = 4000;
private int timestampDefaultPrecision = 6;
private int numberDefaultScale = 10;
private RoundingMode numberRoundingMode = RoundingMode.HALF_UP;
@NotNull
public boolean isSynonymsEnabled()
{
return synonymsEnabled;
}
@Config("gauss.synonyms.enabled")
public GaussConfig setSynonymsEnabled(boolean enabled)
{
this.synonymsEnabled = enabled;
return this;
}
@Min(0)
@Max(38)
public int getNumberDefaultScale()
{
return numberDefaultScale;
}
@Config("gauss.number.default-scale")
public GaussConfig setNumberDefaultScale(int numberDefaultScale)
{
this.numberDefaultScale = numberDefaultScale;
return this;
}
@NotNull
public RoundingMode getNumberRoundingMode()
{
return numberRoundingMode;
}
@Config("gauss.number.rounding-mode")
public GaussConfig setNumberRoundingMode(RoundingMode numberRoundingMode)
{
this.numberRoundingMode = numberRoundingMode;
return this;
}
@Min(4000)
public int getVarcharMaxSize()
{
return varcharMaxSize;
}
@Config("gauss.varchar.max-size")
public GaussConfig setVarcharMaxSize(int varcharMaxSize)
{
this.varcharMaxSize = varcharMaxSize;
return this;
}
@Min(0)
@Max(9)
public int getTimestampDefaultPrecision()
{
return timestampDefaultPrecision;
}
@Config("gauss.timestamp.precision")
public GaussConfig setTimestampDefaultPrecision(int timestampDefaultPrecision)
{
this.timestampDefaultPrecision = timestampDefaultPrecision;
return this;
}
}
測試資料來源註冊
入參:
{
"catalogName": "gauss-test1",
"connectorName": "gauss",
"properties": {
"connection-url":"jdbc: gaussdb://172.30.***.***:***/yth_shuguan",
"connection-user":"***",
"connection-password":"***"
}
}
返回資料來源資訊:
2024-03-06T17:27:08.532+0800 INFO Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION
2024-03-06T17:27:08.532+0800 INFO Bootstrap gauss.number.default-scale 10 10
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.number.rounding-mode HALF_UP HALF_UP
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.synonyms.enabled false false
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.timestamp.precision 6 6
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.varchar.max-size 4000 4000
2024-03-06T17:27:08.533+0800 INFO Bootstrap case-insensitive-name-matching false false
2024-03-06T17:27:08.533+0800 INFO Bootstrap case-insensitive-name-matching.cache-ttl 1.00m 1.00m
2024-03-06T17:27:08.533+0800 INFO Bootstrap connection-password [REDACTED] [REDACTED]
2024-03-06T17:27:08.533+0800 INFO Bootstrap connection-url ---- jdbc:gaussdb://172.30.***.***:***/yth_shuguan
2024-03-06T17:27:08.533+0800 INFO Bootstrap connection-user ---- yth
2024-03-06T17:27:08.533+0800 INFO Bootstrap password-credential-name ---- ----
2024-03-06T17:27:08.533+0800 INFO Bootstrap user-credential-name ---- ----
2024-03-06T17:27:08.533+0800 INFO Bootstrap allow-drop-table false false Allow connector to drop tables
2024-03-06T17:27:08.550+0800 INFO com.facebook.airlift.bootstrap.LifeCycleManager Life cycle startup complete. System ready.
2024-03-06T17:27:08.549+0800 INFO com.facebook.airlift.bootstrap.LifeCycleManager Life cycle starting...
其他說明
適配非JDBC資料來源,如HuaweiHD651-V310(基於Hive開發),需要自己實現如下介面
介面名稱 | 說明 |
---|---|
ConnectorFactory | Connector工廠 |
ConnectorMetadata | 獲取資料來源後設資料 |
ConnectorHandleResolver | 獲取各種Handler |
ConnectorSplitManager | 處理任務分片 |
ConnectorRecordSetProviderConnectorPageSourceProvider | 讀取資料 |
ConnectorPageSinkProvider | 寫入資料 |
參考
Presto官方文件:
https://prestodb.io/docs/0.253/
美團技術團隊分享:
https://tech.meituan.com/2014/06/16/presto.html