Presto適配高斯資料庫

今晚煮鸡蛋發表於2024-03-06

Presto是什麼

Presto 是一個分散式 SQL 查詢引擎,旨在查詢大型資料集 分佈在一個或多個異構資料來源上。筆者所參與的專案主要使用Presto做資料探查和資料分析。

Presto架構

Presto查詢引擎是一個Master-Slave的架構,由一個Coordinator節點,一個Discovery Server節點,多個Worker節點組成,Discovery Server通常內嵌於Coordinator節點中。

Coordinator負責解析SQL語句,生成執行計劃,分發執行任務給Worker節點執行。

Worker節點負責實際執行查詢任務。Worker節點啟動後向Discovery Server服務註冊,Coordinator從Discovery Server獲得可以正常工作的Worker節點。

Presto是功能如何實現

Presto工程中使用Connector負責Presto與資料來源進行互動,不同的資料庫對應於不同的Connector。Connector是使用 SPI 作為服務提供/發現機制的。Java中SPI機制主要思想是將裝配的控制權移到程式之外。

在Presto中的應用就是基於 com.facebook.presto.spi.Plugin 這個介面實現一個對應的xxxPlugin。並且在plugin.dir配置的路徑中新增實現com.facebook.presto.spi.Plugin介面程式碼對應的產物,即可在Presto中實現與資料來源的互動。

高斯資料庫GuassPlugin外掛開發

軟體版本資訊

· presto原始碼版本0.253

· jdk1.8-151(和presto原始碼版本對應)

具體實現步驟

Presto原始專案中建立presto-gauss專案

presto-gauss pom中需要新增com.facebook.presto作為父專案,同時需要新增Presto官方實現的JdbcPlugin依賴包,以及高斯資料庫JDBC包。

將打包專案打包方式設定成presto-plugin。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>presto-root</artifactId>
        <groupId>com.facebook.presto</groupId>
        <version>0.253</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>presto-gauss</artifactId>
    <description>Presto - gauss Connector</description>
    <packaging>presto-plugin</packaging>

    <dependencies>

        <dependency>
            <groupId>com.facebook.presto</groupId>
            <artifactId>presto-base-jdbc</artifactId>
        </dependency>

        <dependency>
            <groupId>com.facebook.airlift</groupId>
            <artifactId>configuration</artifactId>
        </dependency>


        <dependency>
            <groupId>com.facebook.airlift</groupId>
            <artifactId>log-manager</artifactId>
            <scope>runtime</scope>
        </dependency>


        <dependency>
            <groupId>com.google.inject</groupId>
            <artifactId>guice</artifactId>
        </dependency>

        <dependency>
            <groupId>javax.inject</groupId>
            <artifactId>javax.inject</artifactId>
        </dependency>

        <dependency>
            <groupId>javax.validation</groupId>
            <artifactId>validation-api</artifactId>
        </dependency>

        <!-- Presto SPI -->
        <dependency>
            <groupId>com.facebook.presto</groupId>
            <artifactId>presto-spi</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>com.facebook.presto</groupId>
            <artifactId>presto-common</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>com.facebook.drift</groupId>
            <artifactId>drift-api</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>io.airlift</groupId>
            <artifactId>slice</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>io.airlift</groupId>
            <artifactId>units</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.openjdk.jol</groupId>
            <artifactId>jol-core</artifactId>
            <scope>provided</scope>
        </dependency>

       
        <!-- 華為高斯依賴 -->
        <dependency>
            <groupId>com.huawei</groupId>
            <artifactId>gsjdbc200</artifactId>
            <version>8.1.3</version>
        </dependency>

    </dependencies>


    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <configuration>
                    <!-- TODO: Remove this once fixed -->
                    <ignoredDependencies>
                    </ignoredDependencies>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <profiles>
        <profile>
            <id>ci</id>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-surefire-plugin</artifactId>
                        <configuration>
                            <excludes combine.self="override"/>
                        </configuration>
                    </plugin>
                </plugins>
            </build>
        </profile>

        <profile>
            <id>default</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-surefire-plugin</artifactId>
                    </plugin>
                </plugins>
            </build>
        </profile>
    </profiles>
</project>

在presto-root專案的pom中增加module

<modules>
    …… 
    <module>presto-gauss</module> 
    ……
</modules> 

在presto-server專案中增加presto-gauss依賴

<dependency>
	<groupId>com.facebook.presto</groupId>
	<artifactId>presto-gauss</artifactId>
	<version>${project.version}</version>
	<type>zip</type>
	<scope>provided</scope>
</dependency>

基於高斯資料庫驅動實現ConnectionFactory例項的建立

因為高斯資料庫是JDBC 資料來源官方的JdbcPlugin中已經實現com.facebook.presto.spi.Plugin中的介面,因此我們僅需高斯Jdbc驅動例項構建ConnectionFactory即可。

高斯資料庫驅動對於JDBC的支援程度、資料來源本身的特性,所以需要對原始JDBC進行一些適配。

建立GaussClient.java實現ConnectionFactory,同時完成表重新命名、資料型別對映、查詢表和模式的等方法進行適配。

	package com.facebook.presto.plugin.gauss;
	
	import com.facebook.presto.common.type.Decimals;
	import com.facebook.presto.common.type.Type;
	import com.facebook.presto.common.type.VarcharType;
	import com.facebook.presto.plugin.jdbc.*;
	import com.facebook.presto.spi.ConnectorSession;
	import com.facebook.presto.spi.PrestoException;
	import com.facebook.presto.spi.SchemaTableName;
	import com.huawei.gauss200.jdbc.Driver;
	
	import javax.inject.Inject;
	import java.sql.*;
	import java.util.Optional;
	import java.util.Properties;
	
	import static com.facebook.presto.common.type.DecimalType.createDecimalType;
	import static com.facebook.presto.common.type.VarbinaryType.VARBINARY;
	import static com.facebook.presto.common.type.VarcharType.createUnboundedVarcharType;
	import static com.facebook.presto.common.type.VarcharType.createVarcharType;
	import static com.facebook.presto.plugin.jdbc.DriverConnectionFactory.basicConnectionProperties;
	import static com.facebook.presto.plugin.jdbc.JdbcErrorCode.JDBC_ERROR;
	import static com.facebook.presto.plugin.jdbc.StandardReadMappings.*;
	import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;
	import static java.lang.String.format;
	import static java.util.Locale.ENGLISH;
	import static java.util.Objects.requireNonNull;
	
	public class GaussClient extends BaseJdbcClient
	{
	    private static final int FETCH_SIZE = 1000;
	
	    private final boolean synonymsEnabled;
	    private final int numberDefaultScale;
	
	    @Inject
	    public GaussClient(JdbcConnectorId connectorId, BaseJdbcConfig config, GaussConfig gaussConfig) throws SQLException {
	        super(connectorId, config, "", connectionFactory(config, gaussConfig));
	        requireNonNull(gaussConfig, "gauss config is null");
	        this.synonymsEnabled = gaussConfig.isSynonymsEnabled();
	        this.numberDefaultScale = gaussConfig.getNumberDefaultScale();
	    }
	
	
	    private static ConnectionFactory connectionFactory(BaseJdbcConfig config, GaussConfig gaussConfig) {
	        Properties connectionProperties = basicConnectionProperties(config);
	        connectionProperties.setProperty("useUnicode", "true");
	        connectionProperties.setProperty("characterEncoding", "utf8");
	        connectionProperties.setProperty("ssl", "false");
	        return new DriverConnectionFactory(new Driver(), config.getConnectionUrl(), 
	                Optional.ofNullable(config.getUserCredentialName()),
	                Optional.ofNullable(config.getPasswordCredentialName()), connectionProperties);
	    }
	
	
	    private String[] getTableTypes()
	    {
	        if (synonymsEnabled) {
	            return new String[] {"TABLE", "VIEW", "SYNONYM"};
	        }
	        return new String[] {"TABLE", "VIEW"};
	    }
	
	    @Override
	    protected ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)
	            throws SQLException
	    {
	        DatabaseMetaData metadata = connection.getMetaData();
	        String escape = metadata.getSearchStringEscape();
	        return metadata.getTables(
	                connection.getCatalog(),
	                escapeNamePattern(schemaName, Optional.of(escape)).orElse(null),
	                escapeNamePattern(tableName, Optional.of(escape)).orElse(null),
	                getTableTypes());
	    }
	    @Override
	    public PreparedStatement getPreparedStatement(Connection connection, String sql)
	            throws SQLException
	    {
	        PreparedStatement statement = connection.prepareStatement(sql);
	        statement.setFetchSize(FETCH_SIZE);
	        return statement;
	    }
	
	    @Override
	    protected String generateTemporaryTableName()
	    {
	        return "presto_tmp_" + System.nanoTime();
	    }
	
	    @Override
	    protected void renameTable(JdbcIdentity identity, String catalogName, SchemaTableName oldTable, SchemaTableName newTable)
	    {
	        if (!oldTable.getSchemaName().equalsIgnoreCase(newTable.getSchemaName())) {
	            throw new PrestoException(NOT_SUPPORTED, "Table rename across schemas is not supported in gauss");
	        }
	
	        String newTableName = newTable.getTableName().toUpperCase(ENGLISH);
	        String oldTableName = oldTable.getTableName().toUpperCase(ENGLISH);
	        String sql = format(
	                "ALTER TABLE %s RENAME TO %s",
	                quoted(catalogName, oldTable.getSchemaName(), oldTableName),
	                quoted(newTableName));
	
	        try (Connection connection = connectionFactory.openConnection(identity)) {
	            execute(connection, sql);
	        }
	        catch (SQLException e) {
	            throw new PrestoException(JDBC_ERROR, e);
	        }
	    }
	
	    @Override
	    public Optional<ReadMapping> toPrestoType(ConnectorSession session, JdbcTypeHandle typeHandle)
	    {
	        int columnSize = typeHandle.getColumnSize();
	
	        switch (typeHandle.getJdbcType()) {
	            case Types.CLOB:
	            case Types.NCLOB:
	                return Optional.of(varcharReadMapping(createUnboundedVarcharType()));
	            case Types.SMALLINT:
	                return Optional.of(smallintReadMapping());
	            case Types.FLOAT:
	            case Types.DOUBLE:
	                return Optional.of(doubleReadMapping());
	            case Types.REAL:
	                return Optional.of(realReadMapping());
	            case Types.NUMERIC:
	                int precision = columnSize == 0 ? Decimals.MAX_PRECISION : columnSize;
	                int scale = typeHandle.getDecimalDigits();
	
	                if (scale == 0) {
	                    return Optional.of(bigintReadMapping());
	                }
	                if (scale < 0 || scale > precision) {
	                    return Optional.of(decimalReadMapping(createDecimalType(precision, numberDefaultScale)));
	                }
	                return Optional.of(decimalReadMapping(createDecimalType(precision, scale)));
	            case Types.VARCHAR:
	            case Types.NVARCHAR:
	            case Types.LONGVARCHAR:
	            case Types.LONGNVARCHAR:
	                if (columnSize > VarcharType.MAX_LENGTH) {
	                    return Optional.of(varcharReadMapping(createUnboundedVarcharType()));
	                }
	                return Optional.of(varcharReadMapping(createVarcharType(columnSize)));
	            case Types.BLOB:
	                return Optional.of(varbinaryReadMapping());
	        }
	        return super.toPrestoType(session, typeHandle);
	    }
	
	    @Override
	    protected String toSqlType(Type type) {
	        if (VARBINARY.equals(type)) {
	            return "blob";
	        }
	        return super.toSqlType(type);
	    }
	}

基於presto使用的Guice依賴注入框架,建立GuassPlugin.java、GaussClientModule.java、GaussConfig.java 類註冊實現的外掛

GuassPlugin.java:

package com.facebook.presto.plugin.gauss;
import com.facebook.presto.plugin.jdbc.JdbcPlugin;

/**
 *  Initialize GuassPlugin class for prestoDB
 */
public class GuassPlugin
        extends JdbcPlugin
{
    /**
     *  gauss Plugin Constructor
     */
    public GuassPlugin()
    {
        super("gauss", new GaussClientModule());
    }
}

GaussClientModule.java:

package com.facebook.presto.plugin.gauss;

import com.facebook.presto.plugin.jdbc.BaseJdbcConfig;
import com.facebook.presto.plugin.jdbc.JdbcClient;
import com.google.inject.Binder;
import com.google.inject.Module;
import com.google.inject.Scopes;

import static com.facebook.airlift.configuration.ConfigBinder.configBinder;

public class GaussClientModule
        implements Module
{
    @Override
    public void configure(Binder binder)
    {
        binder.bind(JdbcClient.class).to(GaussClient.class)
                .in(Scopes.SINGLETON);
        configBinder(binder).bindConfig(BaseJdbcConfig.class);
        configBinder(binder).bindConfig(GaussConfig.class);
    }
}

GaussClientModule.java:

	package com.facebook.presto.plugin.gauss;
	
	import com.facebook.airlift.configuration.Config;
	
	import javax.validation.constraints.Max;
	import javax.validation.constraints.Min;
	import javax.validation.constraints.NotNull;
	
	import java.math.RoundingMode;
	
	public class GaussConfig
	{
	    private boolean synonymsEnabled;
	    private int varcharMaxSize = 4000;
	    private int timestampDefaultPrecision = 6;
	    private int numberDefaultScale = 10;
	    private RoundingMode numberRoundingMode = RoundingMode.HALF_UP;
	
	    @NotNull
	    public boolean isSynonymsEnabled()
	    {
	        return synonymsEnabled;
	    }
	
	    @Config("gauss.synonyms.enabled")
	    public GaussConfig setSynonymsEnabled(boolean enabled)
	    {
	        this.synonymsEnabled = enabled;
	        return this;
	    }
	
	    @Min(0)
	    @Max(38)
	    public int getNumberDefaultScale()
	    {
	        return numberDefaultScale;
	    }
	
	    @Config("gauss.number.default-scale")
	    public GaussConfig setNumberDefaultScale(int numberDefaultScale)
	    {
	        this.numberDefaultScale = numberDefaultScale;
	        return this;
	    }
	
	    @NotNull
	    public RoundingMode getNumberRoundingMode()
	    {
	        return numberRoundingMode;
	    }
	
	    @Config("gauss.number.rounding-mode")
	    public GaussConfig setNumberRoundingMode(RoundingMode numberRoundingMode)
	    {
	        this.numberRoundingMode = numberRoundingMode;
	        return this;
	    }
	
	    @Min(4000)
	    public int getVarcharMaxSize()
	    {
	        return varcharMaxSize;
	    }
	
	    @Config("gauss.varchar.max-size")
	    public GaussConfig setVarcharMaxSize(int varcharMaxSize)
	    {
	        this.varcharMaxSize = varcharMaxSize;
	        return this;
	    }
	
	    @Min(0)
	    @Max(9)
	    public int getTimestampDefaultPrecision()
	    {
	        return timestampDefaultPrecision;
	    }
	
	    @Config("gauss.timestamp.precision")
	    public GaussConfig setTimestampDefaultPrecision(int timestampDefaultPrecision)
	    {
	        this.timestampDefaultPrecision = timestampDefaultPrecision;
	        return this;
	    }
	}

測試資料來源註冊

入參:

{
  "catalogName": "gauss-test1",
  "connectorName": "gauss",
  "properties": {
    "connection-url":"jdbc: gaussdb://172.30.***.***:***/yth_shuguan",
    "connection-user":"***",
    "connection-password":"***"
  }
}

返回資料來源資訊:

2024-03-06T17:27:08.532+0800  INFO   Bootstrap        PROPERTY                                  DEFAULT     RUNTIME                                         DESCRIPTION
2024-03-06T17:27:08.532+0800  INFO   Bootstrap        gauss.number.default-scale                10          10
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        gauss.number.rounding-mode                HALF_UP     HALF_UP
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        gauss.synonyms.enabled                    false       false
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        gauss.timestamp.precision                 6           6
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        gauss.varchar.max-size                    4000        4000
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        case-insensitive-name-matching            false       false
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        case-insensitive-name-matching.cache-ttl  1.00m       1.00m
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        connection-password                       [REDACTED]  [REDACTED]
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        connection-url                            ----        jdbc:gaussdb://172.30.***.***:***/yth_shuguan
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        connection-user                           ----        yth
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        password-credential-name                  ----        ----
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        user-credential-name                      ----        ----
2024-03-06T17:27:08.533+0800  INFO   Bootstrap        allow-drop-table                          false       false                                           Allow connector to drop tables
2024-03-06T17:27:08.550+0800  INFO   com.facebook.airlift.bootstrap.LifeCycleManager        Life cycle startup complete. System ready.
2024-03-06T17:27:08.549+0800  INFO   com.facebook.airlift.bootstrap.LifeCycleManager        Life cycle starting...

其他說明

適配非JDBC資料來源,如HuaweiHD651-V310(基於Hive開發),需要自己實現如下介面

介面名稱 說明
ConnectorFactory Connector工廠
ConnectorMetadata 獲取資料來源後設資料
ConnectorHandleResolver 獲取各種Handler
ConnectorSplitManager 處理任務分片
ConnectorRecordSetProviderConnectorPageSourceProvider 讀取資料
ConnectorPageSinkProvider 寫入資料

參考

Presto官方文件:

https://prestodb.io/docs/0.253/

美團技術團隊分享:

https://tech.meituan.com/2014/06/16/presto.html

相關文章